- Overview
- Tutorials
- How Tos
- Download
- Install
- Configure
- Secure
- TLS API Configuration
- Configure API Authentication and Authorization with JWT
- Configure API Limits
- Set Resource Limits
- Crypto key management
- Restrict key usage
- Namespace Key Management
- Key management service (KMS) configuration
- Optimize
- Observe
- Operate
- Initializing node identity manually
- Canton Console
- Synchronizer connections
- High Availability Usage
- Manage Daml packages and archives
- Participant Node pruning
- Party Management
- Party Replication
- Decentralized party overview
- Setup an External Party
- Ledger API User Management
- Node Traffic Management
- Identity Management
- Upgrade
- Decommission
- Recover
- Troubleshoot
- Explanations
- Reference
Note
This page is a work in progress. It may contain incomplete or incorrect information.
Monitoring Best Practices¶
Introduction¶
Observability (also known as “monitoring”) lets you determine if the Daml Enterprise solution is healthy or not. If the state is not healthy, observability helps diagnose the root cause. There are three parts to observability: metrics, logs, and traces. These are described in this section.
To avoid becoming overwhelmed by the number of metrics and log messages, follow these steps:
Read the shortcut to learning what is important, which is described below in the section Hands-On with the Daml Enterprise - Observability Example as a starting point and inspiration when building your metric monitoring.
For an overview of how most metrics are exposed, read the section Golden Signals and Key Metrics Quick Start below. It describes the philosophy behind metric naming and labeling.
The remaining sections provide references to more detailed information.
Hands-On with the Daml Enterprise - Observability Example¶
The Daml Enterprise - Observability Example GitHub repository provides a complete reference example for exploring the metrics that Daml Enterprise exposes. You can use it to explore the collection, aggregation, filtering, and visualization of metrics. It is self-contained, with the following components:
An example Docker compose file to create a run-time for all the components
Some shell scripts to generate requests to the Daml Enterprise solution
A Prometheus config file to scrape the metrics data
A Grafana template file(s) to visualize the metrics in a meaningful way, such as shown below in the example dashboard

Dashboard with metrics¶
Golden Signals and Key Metrics Quick Start¶
The best practice for monitoring a microservices application is an approach known as the Golden Signals, or the RED method. In this approach, metric monitoring determines whether the application is healthy and, if not healthy, which service is the root cause of the issue. The Golden Signals for HTTP and gRPC endpoints are supported for all endpoints. Key metrics specific to Daml Enterprises are also available. These are described below.
The following Golden Signal metrics for each HTTP and gRPC API are available:
Input request rate, as a counter
Error rate, as a counter (discussed below)
Latency (the time to process a request), as a histogram
Size of the payload, as a counter, following the Apache HTTP precedent
You can filter or aggregate each metric using its accompanying labels. The instrumentation labels added to each HTTP API metric are as follows:
http_verb
: the HTTP verb (for example: GET, POST)http_status
: the status code (for example: 200, 401, 403, 504)host
: the host identifierdaml_version
: the Daml release numberservice
: a string to identify what Daml service or Canton component is running in this process (for example:participant
,sequencer
,json_api
)path
: the request made to the endpoint (for example:/v2/commands/submit-and-wait
,/v2/state/active-contracts
)
The gRPC protocol is layered on top of HTTP/2, so certain labels (such as the daml_version
and service
) from the above section are included. The labels added by default to each gRPC API metric are as follows:
canton_version
: the Canton protocol versiongrpc_code
: the human-readable status code for gRPC (for example:OK
,CANCELLED
,DEADLINE_EXCEEDED
)The type of the client/server gRPC request, under the labels
grpc_client_type
andgrpc_server_type
The protobuf package and service names, under the labels
grpc_service_name
andgrpc_method_name
The following other key metrics are monitored:
A binary gauge indicates whether the node is healthy or not healthy. This can also be used to infer which node is passive in a highly available configuration because it will show as not being healthy, while the active node is always healthy.
A binary gauge signals whether a node is active or passive, for identifying which node is the active node.
A binary gauge detects when pruning is occurring.
Each participant node measures the count of the inflight (dirty) requests so the user can see if
maxDirtyRequests
limit is close to being hit. The metrics are:canton_dirty_requests
andcanton_max_dirty_requests
.Each participant node records the distribution of events (updates) received by the participant and allows drill-down by event type (package upload, party creation, or transaction), status (success or failure), participant ID, and application ID (if available). The counter is called
daml_indexer_events_total
.The ledger event requests are totaled in a counter called
daml_indexer_metered_events_total
.JVM garbage collection metrics are collected.
This list is not exhaustive. It highlights the most important metrics.
Set Up Metrics Scraping¶
Enable the Prometheus Reporter¶
Prometheus is recommended for metrics reporting. Other reporters (jmx, graphite, and csv) are supported, but they are deprecated. Any such reporter should be migrated to Prometheus.
Prometheus can be enabled using:
canton.monitoring.metrics.reporters = [{
type = prometheus
address = "localhost" // default
port = 9000 // default
}]
Prometheus-Only Metrics¶
Some metrics are available only when using the Prometheus reporter. These metrics include common gRPC and HTTP metrics (which help you to measure the four golden signals), and JVM GC and memory usage metrics (if enabled). The metrics are documented in detail below.
Any metric marked with *
is available only when using the Prometheus reporter.
Deprecated Reporters¶
JMX-based reporting (for testing purposes only) can be enabled using:
canton.monitoring.metrics.reporters = [{ type = jmx }]
Additionally, metrics can be written to a file:
canton.monitoring.metrics.reporters = [{
type = jmx
}, {
type = csv
directory = "metrics"
interval = 5s // default
filters = [{
contains = "canton"
}]
}]
or reported via Graphite (to Grafana) using:
canton.monitoring.metrics.reporters = [{
type = graphite
address = "localhost" // default
port = 2003
prefix.type = hostname // default
interval = 30s // default
filters = [{
contains = "canton"
}]
}]
When using the graphite
or the csv
reporter, Canton periodically evaluates all metrics matching the given filters. Filter for only those metrics that are relevant to you.
In addition to Canton metrics, the process can also report Daml metrics (of the Ledger API server). Optionally, JVM metrics can be included using:
canton.monitoring.metrics.report-jvm-metrics = yes // default no
Metrics¶
The following sections contain the common metrics exposed for Daml services supporting a Prometheus metrics reporter.
For the metric types referenced below, see the relevant Prometheus documentation.
Participant Metrics¶
daml.cache.evicted_weight¶
Summary: The sum of weights of cache entries evicted.
Description: The total weight of the entries evicted from the cache.
Type: counter
Qualification: Debug
daml.cache.evictions¶
Summary: The number of the evicted cache entries.
Description: When an entry is evicted from the cache, the counter is incremented.
Type: counter
Qualification: Debug
daml.cache.hits¶
Summary: The number of cache hits.
Description: When a cache lookup encounters an existing cache entry, the counter is incremented.
Type: counter
Qualification: Debug
daml.cache.misses¶
Summary: The number of cache misses.
Description: When a cache lookup first encounters a missing cache entry, the counter is incremented.
Type: counter
Qualification: Debug
daml.db-storage.general.executor.exectime¶
Summary: Execution time metric for database tasks
Description: The time a task is running on the database is measured using this metric.
Type: timer
Qualification: Debug
daml.db-storage.general.executor.load¶
Summary: Load of database pool
Description: Database queries run as tasks on an async executor. This metric shows the current number of queries running in parallel divided by the number database connections for this database connection pool.
Type: gauge
Qualification: Saturation
daml.db-storage.general.executor.queued¶
Summary: Number of database access tasks waiting in queue
Description: Database access tasks get scheduled in this queue and get executed using one of the existing asynchronous sessions. A large queue indicates that the database connection is not able to deal with the large number of requests. Note that the queue has a maximum size. Tasks that do not fit into the queue will be retried, but won’t show up in this metric.
Type: counter
Qualification: Saturation
daml.db-storage.general.executor.running¶
Summary: Number of database access tasks currently running
Description: Database access tasks run on an async executor. This metric shows the current number of tasks running in parallel.
Type: gauge
Qualification: Debug
daml.db-storage.general.executor.waittime¶
Summary: Scheduling time metric for database tasks
Description: Every database query is scheduled using an asynchronous executor with a queue. The time a task is waiting in this queue is monitored using this metric.
Type: timer
Qualification: Debug
daml.db-storage.write.executor.exectime¶
Summary: Execution time metric for database tasks
Description: The time a task is running on the database is measured using this metric.
Type: timer
Qualification: Debug
daml.db-storage.write.executor.load¶
Summary: Load of database pool
Description: Database queries run as tasks on an async executor. This metric shows the current number of queries running in parallel divided by the number database connections for this database connection pool.
Type: gauge
Qualification: Saturation
daml.db-storage.write.executor.queued¶
Summary: Number of database access tasks waiting in queue
Description: Database access tasks get scheduled in this queue and get executed using one of the existing asynchronous sessions. A large queue indicates that the database connection is not able to deal with the large number of requests. Note that the queue has a maximum size. Tasks that do not fit into the queue will be retried, but won’t show up in this metric.
Type: counter
Qualification: Saturation
daml.db-storage.write.executor.running¶
Summary: Number of database access tasks currently running
Description: Database access tasks run on an async executor. This metric shows the current number of tasks running in parallel.
Type: gauge
Qualification: Debug
daml.db-storage.write.executor.waittime¶
Summary: Scheduling time metric for database tasks
Description: Every database query is scheduled using an asynchronous executor with a queue. The time a task is waiting in this queue is monitored using this metric.
Type: timer
Qualification: Debug
daml.db.commit*¶
Summary: The time needed to perform the SQL query commit.
Description: This metric measures the time it takes to commit an SQL transaction relating to the <operation>. It roughly corresponds to calling commit() on a DB connection.
Type: timer
Qualification: Debug
- Labels:
name: The operation/pool for which the metric is registered.
daml.db.compression*¶
Summary: The time needed to decompress the SQL query result.
Description: Some index database queries that target contracts involve a decompression step. For such queries this metric represents the time it takes to decompress contract arguments retrieved from the database.
Type: timer
Qualification: Debug
- Labels:
name: The operation/pool for which the metric is registered.
daml.db.exec*¶
Summary: The time needed to run the SQL query and read the result.
Description: This metric encompasses the time measured by query and commit metrics. Additionally it includes the time needed to obtain the DB connection, optionally roll it back and close the connection at the end.
Type: timer
Qualification: Debug
- Labels:
name: The operation/pool for which the metric is registered.
daml.db.query*¶
Summary: The time needed to run the SQL query.
Description: This metric measures the time it takes to execute a block of code (on a dedicated executor) related to the <operation> that can issue multiple SQL statements such that all run in a single DB transaction (either committed or aborted).
Type: timer
Qualification: Debug
- Labels:
name: The operation/pool for which the metric is registered.
daml.db.translation*¶
Summary: The time needed to turn serialized Daml-LF values into in-memory objects.
Description: Some index database queries that target contracts and transactions involve a Daml-LF translation step. For such queries this metric stands for the time it takes to turn the serialized Daml-LF values into in-memory representation.
Type: timer
Qualification: Debug
- Labels:
name: The operation/pool for which the metric is registered.
daml.db.wait*¶
Summary: The time needed to acquire a connection to the database.
Description: SQL statements are run in a dedicated executor. This metric measures the time it takes between creating the SQL statement corresponding to the <operation> and the point when it starts running on the dedicated executor.
Type: timer
Qualification: Debug
- Labels:
name: The operation/pool for which the metric is registered.
daml.grpc.server¶
Summary: Distribution of the durations of serving gRPC requests.
Description:
Type: timer
Qualification: Latency
daml.grpc.server.handled¶
Summary: Total number of handled gRPC requests.
Description:
Type: meter
Qualification: Traffic
daml.grpc.server.messages.received¶
Summary: Total number of gRPC messages received (on either type of connection).
Description:
Type: meter
Qualification: Traffic
daml.grpc.server.messages.received.bytes¶
Summary: Distribution of payload sizes in gRPC messages received (both unary and streaming).
Description:
Type: histogram
Qualification: Traffic
daml.grpc.server.messages.sent¶
Summary: Total number of gRPC messages sent (on either type of connection).
Description:
Type: meter
Qualification: Traffic
daml.grpc.server.messages.sent.bytes¶
Summary: Distribution of payload sizes in gRPC messages sent (both unary and streaming).
Description:
Type: histogram
Qualification: Traffic
daml.grpc.server.started¶
Summary: Total number of started gRPC requests (on either type of connection).
Description:
Type: meter
Qualification: Traffic
daml.http.requests¶
Summary: Total number of HTTP requests received.
Description:
Type: meter
Qualification: Debug
daml.http.requests¶
Summary: The duration of the HTTP requests.
Description:
Type: timer
Qualification: Debug
daml.http.requests.payload.bytes¶
Summary: Distribution of the sizes of payloads received in HTTP requests.
Description:
Type: histogram
Qualification: Debug
daml.http.responses.payload.bytes¶
Summary: Distribution of the sizes of payloads sent in HTTP responses.
Description:
Type: histogram
Qualification: Debug
daml.http.websocket.messages.received¶
Summary: Total number of received WebSocket messages.
Description:
Type: meter
Qualification: Debug
daml.http.websocket.messages.received.bytes¶
Summary: Distribution of the size of received WebSocket messages.
Description:
Type: histogram
Qualification: Debug
daml.http.websocket.messages.sent¶
Summary: Total number of sent WebSocket messages.
Description:
Type: meter
Qualification: Debug
daml.http.websocket.messages.sent.bytes¶
Summary: Distribution of the size of sent WebSocket messages.
Description:
Type: histogram
Qualification: Debug
daml.participant.api.commands.delayed_submissions¶
Summary: The number of the delayed Daml commands.
Description: The number of Daml commands that have been delayed internally because they have been evaluated to require the ledger time further in the future than the expected latency.
Type: meter
Qualification: Debug
daml.participant.api.commands.failed_command_interpretations¶
Summary: The number of Daml commands that failed in interpretation.
Description: The number of Daml commands that have been rejected by the interpreter (e.g. badly authorized action).
Type: meter
Qualification: Errors
daml.participant.api.commands.interactive_prepares¶
Summary: The time to prepare a transaction for interactive submission.
Description: The time to validate and interpret a command before it is returned to the caller for external signing.
Type: timer
Qualification: Latency
daml.participant.api.commands.max_in_flight_capacity¶
Summary: The maximum number of Daml commands that can await completion.
Description: The maximum number of Daml commands that can await completion in the Command Service.
Type: counter
Qualification: Debug
daml.participant.api.commands.max_in_flight_length¶
Summary: The number of the Daml commands awaiting completion.
Description: The number of the currently Daml commands awaiting completion in the Command Service.
Type: counter
Qualification: Debug
daml.participant.api.commands.prepares_running¶
Summary: The number of the Daml commands for which transactions are currently being prepared by the ledger api server.
Description: The number of the Daml commands that are currently being prepared by the ledger api server (including validation, interpretation).
Type: counter
Qualification: Saturation
daml.participant.api.commands.reassignment_validation¶
Summary: The time to validate a reassignment command.
Description: The time to validate a submitted Daml command before is fed to the interpreter.
Type: timer
Qualification: Debug
daml.participant.api.commands.submissions¶
Summary: The time to fully process a Daml command.
Description: The time to validate and interpret a command before it is handed over to the synchronization services to be finalized (either committed or rejected).
Type: timer
Qualification: Latency
daml.participant.api.commands.submissions_running¶
Summary: The number of the Daml commands that are currently being handled by the ledger api server.
Description: The number of the Daml commands that are currently being handled by the ledger api server (including validation, interpretation, and handing the transaction over to the synchronization services).
Type: counter
Qualification: Saturation
daml.participant.api.commands.valid_submissions¶
Summary: The total number of the valid Daml commands.
Description: The total number of the Daml commands that have passed validation and were sent to interpretation in this ledger api server process.
Type: meter
Qualification: Debug
daml.participant.api.commands.validation¶
Summary: The time to validate a Daml command.
Description: The time to validate a submitted Daml command before is fed to the interpreter.
Type: timer
Qualification: Debug
daml.participant.api.execution.cache.contract_state.register_update¶
Summary: The time spent to update the contract state cache.
Description: The total time spent in sequential update steps of the contract state caches updating logic. This metric is created with debugging purposes in mind.
Type: timer
Qualification: Debug
daml.participant.api.execution.cache.key_state.register_update¶
Summary: The time spent to update the key state cache.
Description: The total time spent in sequential update steps of the key state caches updating logic. This metric is created with debugging purposes in mind.
Type: timer
Qualification: Debug
daml.participant.api.execution.engine¶
Summary: The time spent executing a Daml command.
Description: The time spent by the Daml engine executing a Daml command (excluding fetching data).
Type: timer
Qualification: Debug
daml.participant.api.execution.engine_running¶
Summary: The number of Daml commands currently being executed.
Description: The number of the commands that are currently being executed by the Daml engine (excluding fetching data).
Type: counter
Qualification: Debug
daml.participant.api.execution.get_lf_package¶
Summary: The time to fetch individual Daml code packages during interpretation.
Description: The interpretation of a command in the ledger api server might require fetching multiple Daml packages. This metric exposes the time needed to fetch the packages that are necessary for interpretation.
Type: timer
Qualification: Debug
daml.participant.api.execution.lookup_active_contract¶
Summary: The time to lookup individual active contracts during interpretation.
Description: The interpretation of a command in the ledger api server might require fetching multiple active contracts. This metric exposes the time to lookup individual active contracts.
Type: timer
Qualification: Debug
daml.participant.api.execution.lookup_active_contract_count_per_execution¶
Summary: The number of the active contracts looked up per Daml command.
Description: The interpretation of a command in the ledger api server might require fetching multiple active contracts. This metric exposes the number of active contracts that must be looked up to process a Daml command.
Type: histogram
Qualification: Debug
daml.participant.api.execution.lookup_active_contract_per_execution¶
Summary: The compound time to lookup all active contracts in a single Daml command.
Description: The interpretation of a command in the ledger api server might require fetching multiple active contracts. This metric exposes the compound time to lookup all the active contracts in a single Daml command.
Type: timer
Qualification: Debug
daml.participant.api.execution.lookup_contract_key¶
Summary: The time to lookup individual contract keys during interpretation.
Description: The interpretation of a command in the ledger api server might require fetching multiple contract keys. This metric exposes the time needed to lookup individual contract keys.
Type: timer
Qualification: Debug
daml.participant.api.execution.lookup_contract_key_count_per_execution¶
Summary: The number of contract keys looked up per Daml command.
Description: The interpretation of a command in the ledger api server might require fetching multiple contract keys. This metric exposes the number of contract keys that must be looked up to process a Daml command.
Type: histogram
Qualification: Debug
daml.participant.api.execution.lookup_contract_key_per_execution¶
Summary: The compound time to lookup all contract keys in a single Daml command.
Description: The interpretation of a command in the ledger api server might require fetching multiple contract keys. This metric exposes the compound time needed to lookup all the contract keys in a single Daml command.
Type: timer
Qualification: Debug
daml.participant.api.execution.retry¶
Summary: The number of the interpretation retries.
Description: The total number of interpretation retries attempted due to mismatching ledger effective time in this ledger api server process.
Type: meter
Qualification: Debug
daml.participant.api.execution.total¶
Summary: The overall time spent interpreting a Daml command.
Description: The time spent interpreting a Daml command in the ledger api server (includes executing Daml and fetching data).
Type: timer
Qualification: Debug
daml.participant.api.execution.total_running¶
Summary: The number of Daml commands currently being interpreted.
Description: The number of the commands that are currently being interpreted (includes executing Daml code and fetching data).
Type: counter
Qualification: Debug
daml.participant.api.index.active_contracts_buffer_size¶
Summary: The buffer size for active contracts requests.
Description: An Pekko stream buffer is added at the end of all streaming queries, allowing to absorb temporary downstream backpressure (e.g. when the client is slower than upstream delivery throughput). This metric gauges the size of the buffer for queries requesting active contracts that transactions satisfying a given predicate.
Type: counter
Qualification: Debug
daml.participant.api.index.completions_buffer_size¶
Summary: The buffer size for completions requests.
Description: An Pekko stream buffer is added at the end of all streaming queries, allowing to absorb temporary downstream backpressure (e.g. when the client is slower than upstream delivery throughput). This metric gauges the size of the buffer for queries requesting the completed commands in a specific period of time.
Type: counter
Qualification: Debug
daml.participant.api.index.db.active_contract_keys_lookup.batch.batch_size¶
Summary: The batch sizes in the lookup batch-loading Contract Service.
Description: The number of lookups contained in a batch, used in the batch-loading Contract Service.
Type: histogram
Qualification: Debug
daml.participant.api.index.db.active_contract_keys_lookup.batch.buffer_capacity¶
Summary: The capacity of the lookup queue.
Description: The maximum number of elements that can be kept in the queue of lookups in the batch-loading queue of the Contract Service.
Type: counter
Qualification: Debug
daml.participant.api.index.db.active_contract_keys_lookup.batch.buffer_delay¶
Summary: The queuing delay for the lookup queue.
Description: The queuing delay for the pending lookups in the batch-loading queue of the Contract Service.
Type: timer
Qualification: Debug
daml.participant.api.index.db.active_contract_keys_lookup.batch.buffer_length¶
Summary: The number of the currently pending lookups.
Description: The number of the currently pending lookups in the batch-loading queue of the Contract Service.
Type: counter
Qualification: Debug
daml.participant.api.index.db.active_contract_lookup.batch.batch_size¶
Summary: The batch sizes in the lookup batch-loading Contract Service.
Description: The number of lookups contained in a batch, used in the batch-loading Contract Service.
Type: histogram
Qualification: Debug
daml.participant.api.index.db.active_contract_lookup.batch.buffer_capacity¶
Summary: The capacity of the lookup queue.
Description: The maximum number of elements that can be kept in the queue of lookups in the batch-loading queue of the Contract Service.
Type: counter
Qualification: Debug
daml.participant.api.index.db.active_contract_lookup.batch.buffer_delay¶
Summary: The queuing delay for the lookup queue.
Description: The queuing delay for the pending lookups in the batch-loading queue of the Contract Service.
Type: timer
Qualification: Debug
daml.participant.api.index.db.active_contract_lookup.batch.buffer_length¶
Summary: The number of the currently pending lookups.
Description: The number of the currently pending lookups in the batch-loading queue of the Contract Service.
Type: counter
Qualification: Debug
daml.participant.api.index.db.flat_transactions_stream.translation¶
Summary: The time needed to turn serialized Daml-LF values into in-memory objects.
Description: Some index database queries that target contracts and transactions involve a Daml-LF translation step. For such queries this metric stands for the time it takes to turn the serialized Daml-LF values into in-memory representation.
Type: timer
Qualification: Debug
daml.participant.api.index.db.lookup_active_contract¶
Summary: The time spent fetching a contract using its id.
Description: This metric exposes the time spent fetching a contract using its id from the index db. It is then used by the Daml interpreter when evaluating a command into a transaction.
Type: timer
Qualification: Debug
daml.participant.api.index.db.lookup_key¶
Summary: The time spent looking up a contract using its key.
Description: This metric exposes the time spent looking up a contract using its key in the index db. It is then used by the Daml interpreter when evaluating a command into a transaction.
Type: timer
Qualification: Debug
daml.participant.api.index.db.reassignment_stream.translation¶
Summary: The time needed to turn serialized Daml-LF values into in-memory objects.
Description: Some index database queries that target contracts and transactions involve a Daml-LF translation step. For such queries this metric stands for the time it takes to turn the serialized Daml-LF values into in-memory representation.
Type: timer
Qualification: Debug
daml.participant.api.index.db.tree_transactions_stream.translation¶
Summary: The time needed to turn serialized Daml-LF values into in-memory objects.
Description: Some index database queries that target contracts and transactions involve a Daml-LF translation step. For such queries this metric stands for the time it takes to turn the serialized Daml-LF values into in-memory representation.
Type: timer
Qualification: Debug
daml.participant.api.index.ledger_end_sequential_id¶
Summary: The sequential id of the current ledger end kept in memory.
Description: The ledger end’s sequential id is a monotonically increasing integer value representing the sequential id ascribed to the most recent ledger event ingested by the index db. Please note, that only a subset of all ledger events are ingested and given a sequential id. These are: creates, consuming exercises, non-consuming exercises and divulgence events. This value can be treated as a counter of all such events visible to a given participant. This metric exposes the latest ledger end’s sequential id registered in the in-memory data set.
Type: gauge
Qualification: Debug
daml.participant.api.index.transaction_trees_buffer_size¶
Summary: The buffer size for transaction trees requests.
Description: An Pekko stream buffer is added at the end of all streaming queries, allowing to absorb temporary downstream backpressure (e.g. when the client is slower than upstream delivery throughput). This metric gauges the size of the buffer for queries requesting transaction trees.
Type: counter
Qualification: Debug
daml.participant.api.index.updates_buffer_size¶
Summary: The buffer size for streaming updates requests.
Description: An Pekko stream buffer is added at the end of all streaming queries, allowing to absorb temporary downstream backpressure (e.g. when the client is slower than upstream delivery throughput). This metric gauges the size of the buffer for queries requesting updates in a specific period of time that satisfy a given predicate.
Type: counter
Qualification: Debug
daml.participant.api.indexer.events*¶
Summary: Number of ledger events processed.
Description: Represents the total number of ledger events processed (transactions, reassignments, party allocations).
Type: meter
Qualification: Debug
- Labels:
participant_id: The id of the participant.
user_id: The user generating the events.
event_type: The type of ledger event processed (transaction, reassignment, party_allocation).
status: Indicates if the event was accepted or not. Possible values accepted|rejected.
daml.participant.api.indexer.indexer_queue_blocked¶
Summary: The amount of blocked enqueue operations for the indexer queue.
Description: Indexer queue exerts backpressure by blocking asynchronous enqueue operations. This meter measures the amount of such blocked operations, signalling backpressure materializing from downstream.
Type: meter
Qualification: Debug
daml.participant.api.indexer.indexer_queue_buffered¶
Summary: The size of the buffer before the indexer.
Description: This buffer is located before the indexer, increasing amount signals backpressure mounting.
Type: meter
Qualification: Debug
daml.participant.api.indexer.indexer_queue_uncommitted¶
Summary: The amount of entries which are uncommitted for the indexer.
Description: Uncommitted entries contain all blocked, buffered and submitted, but not yet committed entries. This amount signals the momentum of stream processing, and has a theoretical maximum defined by all the queue perameters.
Type: meter
Qualification: Debug
daml.participant.api.indexer.ledger_end_sequential_id¶
Summary: The sequential id of the current ledger end kept in the database.
Description: The ledger end’s sequential id is a monotonically increasing integer value representing the sequential id ascribed to the most recent ledger event ingested by the index db. Please note, that only a subset of all ledger events are ingested and given a sequential id. These are: creates, consuming exercises, non-consuming exercises and divulgence events. This value can be treated as a counter of all such events visible to a given participant. This metric exposes the latest ledger end’s sequential id registered in the database.
Type: gauge
Qualification: Debug
daml.participant.api.indexer.metered_events*¶
Summary: Number of individual ledger events (create, exercise, archive).
Description: Represents the number of individual ledger events constituting a transaction.
Type: meter
Qualification: Debug
- Labels:
participant_id: The id of the participant.
user_id: The user generating the events.
daml.participant.api.indexer.output_batched_buffer_length¶
Summary: The size of the queue between the indexer and the in-memory state updating flow.
Description: This counter counts batches of updates passed to the in-memory flow. Batches are dynamically-sized based on amount of backpressure exerted by the downstream stages of the flow.
Type: counter
Qualification: Debug
daml.participant.api.indexer.updates¶
Summary: The number of the state updates persisted to the database.
Description: The number of the state updates persisted to the database. There are updates such as accepted transactions, configuration changes, party allocations, rejections, etc, but they also include synthetic events when the node learned about the sequencer clock advancing without any actual ledger event such as due to submission receipts or time proofs.
Type: counter
Qualification: Traffic
daml.participant.api.lapi.streams.acs_sent¶
Summary: The number of the active contracts sent by the ledger api.
Description: The total number of active contracts sent over the ledger api streams to all clients.
Type: counter
Qualification: Traffic
daml.participant.api.lapi.streams.active¶
Summary: The number of the active streams served by the ledger api.
Description: The number of ledger api streams currently being served to all clients.
Type: gauge
Qualification: Debug
daml.participant.api.lapi.streams.completions_sent¶
Summary: The number of the command completions sent by the ledger api.
Description: The total number of completions sent over the ledger api streams to all clients.
Type: counter
Qualification: Traffic
daml.participant.api.lapi.streams.transaction_trees_sent¶
Summary: The number of the transaction trees sent over the ledger api.
Description: The total number of the transaction trees sent over the ledger api streams to all clients.
Type: counter
Qualification: Traffic
daml.participant.api.lapi.streams.update_trees_sent¶
Summary: The number of the update trees sent over the ledger api.
Description: The total number of the update trees sent over the ledger api streams to all clients.
Type: counter
Qualification: Traffic
daml.participant.api.lapi.streams.updates_sent¶
Summary: The number of the flat updates sent over the ledger api.
Description: The total number of the flat updates sent over the ledger api streams to all clients.
Type: counter
Qualification: Traffic
daml.participant.api.services.current_ledger_end¶
Summary: The time to execute an index service operation.
Description: The index service is an internal component responsible for access to the index db data. Its operations are invoked whenever a client request received over the ledger api requires access to the index db. This metric captures time statistics of such operations.
Type: timer
Qualification: Debug
daml.participant.api.services.get_active_contracts¶
Summary: The time to execute an index service operation.
Description: The index service is an internal component responsible for access to the index db data. Its operations are invoked whenever a client request received over the ledger api requires access to the index db. This metric captures time statistics of such operations.
Type: timer
Qualification: Debug
daml.participant.api.services.get_completions¶
Summary: The time to execute an index service operation.
Description: The index service is an internal component responsible for access to the index db data. Its operations are invoked whenever a client request received over the ledger api requires access to the index db. This metric captures time statistics of such operations.
Type: timer
Qualification: Debug
daml.participant.api.services.get_events_by_contract_id¶
Summary: The time to execute an index service operation.
Description: The index service is an internal component responsible for access to the index db data. Its operations are invoked whenever a client request received over the ledger api requires access to the index db. This metric captures time statistics of such operations.
Type: timer
Qualification: Debug
daml.participant.api.services.get_lf_archive¶
Summary: The time to execute an index service operation.
Description: The index service is an internal component responsible for access to the index db data. Its operations are invoked whenever a client request received over the ledger api requires access to the index db. This metric captures time statistics of such operations.
Type: timer
Qualification: Debug
daml.participant.api.services.get_participant_id¶
Summary: The time to execute an index service operation.
Description: The index service is an internal component responsible for access to the index db data. Its operations are invoked whenever a client request received over the ledger api requires access to the index db. This metric captures time statistics of such operations.
Type: timer
Qualification: Debug
daml.participant.api.services.get_parties¶
Summary: The time to execute an index service operation.
Description: The index service is an internal component responsible for access to the index db data. Its operations are invoked whenever a client request received over the ledger api requires access to the index db. This metric captures time statistics of such operations.
Type: timer
Qualification: Debug
daml.participant.api.services.get_transaction_by_id¶
Summary: The time to execute an index service operation.
Description: The index service is an internal component responsible for access to the index db data. Its operations are invoked whenever a client request received over the ledger api requires access to the index db. This metric captures time statistics of such operations.
Type: timer
Qualification: Debug
daml.participant.api.services.get_transaction_by_offset¶
Summary: The time to execute an index service operation.
Description: The index service is an internal component responsible for access to the index db data. Its operations are invoked whenever a client request received over the ledger api requires access to the index db. This metric captures time statistics of such operations.
Type: timer
Qualification: Debug
daml.participant.api.services.get_transaction_tree_by_id¶
Summary: The time to execute an index service operation.
Description: The index service is an internal component responsible for access to the index db data. Its operations are invoked whenever a client request received over the ledger api requires access to the index db. This metric captures time statistics of such operations.
Type: timer
Qualification: Debug
daml.participant.api.services.get_transaction_tree_by_offset¶
Summary: The time to execute an index service operation.
Description: The index service is an internal component responsible for access to the index db data. Its operations are invoked whenever a client request received over the ledger api requires access to the index db. This metric captures time statistics of such operations.
Type: timer
Qualification: Debug
daml.participant.api.services.get_update_by_id¶
Summary: The time to execute an index service operation.
Description: The index service is an internal component responsible for access to the index db data. Its operations are invoked whenever a client request received over the ledger api requires access to the index db. This metric captures time statistics of such operations.
Type: timer
Qualification: Debug
daml.participant.api.services.get_update_by_offset¶
Summary: The time to execute an index service operation.
Description: The index service is an internal component responsible for access to the index db data. Its operations are invoked whenever a client request received over the ledger api requires access to the index db. This metric captures time statistics of such operations.
Type: timer
Qualification: Debug
daml.participant.api.services.index.in_memory_fan_out_buffer.prune¶
Summary: The time to remove all elements from the in-memory fan-out buffer.
Description: It is possible to remove the oldest entries of the in-memory fan out buffer. This metric exposes the time needed to prune the buffer.
Type: timer
Qualification: Debug
daml.participant.api.services.index.in_memory_fan_out_buffer.push¶
Summary: The time to add a new event into the buffer.
Description: The in-memory fan-out buffer is a buffer that stores the last ingested maxBufferSize accepted and rejected submission updates as TransactionLogUpdate. It allows bypassing IndexDB persistence fetches for recent updates for flat and transaction tree streams, command completion streams and by-event-id and by-transaction-id flat and transaction tree lookups. This metric exposes the time spent on adding a new event into the buffer.
Type: timer
Qualification: Debug
daml.participant.api.services.index.in_memory_fan_out_buffer.size¶
Summary: The size of the in-memory fan-out buffer.
Description: The actual size of the in-memory fan-out buffer. This metric is mostly targeted for debugging purposes.
Type: histogram
Qualification: Saturation
daml.participant.api.services.index.write.allocate_party¶
Summary: The time to execute a write service operation.
Description: The write service is an internal interface for changing the state through the synchronization services. The methods in this interface are all methods that are supported uniformly across all ledger implementations. This metric exposes the time needed to execute each operation.
Type: timer
Qualification: Debug
daml.participant.api.services.index.write.prune¶
Summary: The time to execute a write service operation.
Description: The write service is an internal interface for changing the state through the synchronization services. The methods in this interface are all methods that are supported uniformly across all ledger implementations. This metric exposes the time needed to execute each operation.
Type: timer
Qualification: Debug
daml.participant.api.services.index.write.submit_reassignment¶
Summary: The time to execute a write service operation.
Description: The write service is an internal interface for changing the state through the synchronization services. The methods in this interface are all methods that are supported uniformly across all ledger implementations. This metric exposes the time needed to execute each operation.
Type: timer
Qualification: Debug
daml.participant.api.services.index.write.submit_reassignment_running¶
Summary: The time to execute a write service operation.
Description: The write service is an internal interface for changing the state through the synchronization services. The methods in this interface are all methods that are supported uniformly across all ledger implementations. This metric exposes the time needed to execute each operation.
Type: counter
Qualification: Debug
daml.participant.api.services.index.write.submit_transaction¶
Summary: The time to execute a write service operation.
Description: The write service is an internal interface for changing the state through the synchronization services. The methods in this interface are all methods that are supported uniformly across all ledger implementations. This metric exposes the time needed to execute each operation.
Type: timer
Qualification: Debug
daml.participant.api.services.index.write.submit_transaction_running¶
Summary: The time to execute a write service operation.
Description: The write service is an internal interface for changing the state through the synchronization services. The methods in this interface are all methods that are supported uniformly across all ledger implementations. This metric exposes the time needed to execute each operation.
Type: counter
Qualification: Debug
daml.participant.api.services.index.write.upload_packages¶
Summary: The time to execute a write service operation.
Description: The write service is an internal interface for changing the state through the synchronization services. The methods in this interface are all methods that are supported uniformly across all ledger implementations. This metric exposes the time needed to execute each operation.
Type: timer
Qualification: Debug
daml.participant.api.services.latest_pruned_offsets¶
Summary: The time to execute an index service operation.
Description: The index service is an internal component responsible for access to the index db data. Its operations are invoked whenever a client request received over the ledger api requires access to the index db. This metric captures time statistics of such operations.
Type: timer
Qualification: Debug
daml.participant.api.services.list_known_parties¶
Summary: The time to execute an index service operation.
Description: The index service is an internal component responsible for access to the index db data. Its operations are invoked whenever a client request received over the ledger api requires access to the index db. This metric captures time statistics of such operations.
Type: timer
Qualification: Debug
daml.participant.api.services.list_lf_packages¶
Summary: The time to execute an index service operation.
Description: The index service is an internal component responsible for access to the index db data. Its operations are invoked whenever a client request received over the ledger api requires access to the index db. This metric captures time statistics of such operations.
Type: timer
Qualification: Debug
daml.participant.api.services.lookup_active_contract¶
Summary: The time to execute an index service operation.
Description: The index service is an internal component responsible for access to the index db data. Its operations are invoked whenever a client request received over the ledger api requires access to the index db. This metric captures time statistics of such operations.
Type: timer
Qualification: Debug
daml.participant.api.services.lookup_configuration¶
Summary: The time to execute an index service operation.
Description: The index service is an internal component responsible for access to the index db data. Its operations are invoked whenever a client request received over the ledger api requires access to the index db. This metric captures time statistics of such operations.
Type: timer
Qualification: Debug
daml.participant.api.services.lookup_contract_key¶
Summary: The time to execute an index service operation.
Description: The index service is an internal component responsible for access to the index db data. Its operations are invoked whenever a client request received over the ledger api requires access to the index db. This metric captures time statistics of such operations.
Type: timer
Qualification: Debug
daml.participant.api.services.lookup_contract_state¶
Summary: The time to execute an index service operation.
Description: The index service is an internal component responsible for access to the index db data. Its operations are invoked whenever a client request received over the ledger api requires access to the index db. This metric captures time statistics of such operations.
Type: timer
Qualification: Debug
daml.participant.api.services.lookup_maximum_ledger_time¶
Summary: The time to execute an index service operation.
Description: The index service is an internal component responsible for access to the index db data. Its operations are invoked whenever a client request received over the ledger api requires access to the index db. This metric captures time statistics of such operations.
Type: timer
Qualification: Debug
daml.participant.api.services.party_entries¶
Summary: The time to execute an index service operation.
Description: The index service is an internal component responsible for access to the index db data. Its operations are invoked whenever a client request received over the ledger api requires access to the index db. This metric captures time statistics of such operations.
Type: timer
Qualification: Debug
daml.participant.api.services.prune¶
Summary: The time to execute an index service operation.
Description: The index service is an internal component responsible for access to the index db data. Its operations are invoked whenever a client request received over the ledger api requires access to the index db. This metric captures time statistics of such operations.
Type: timer
Qualification: Debug
daml.participant.api.services.pruning.prune.completed¶
Summary: Total number of completed pruning processes.
Description:
Type: meter
Qualification: Debug
daml.participant.api.services.pruning.prune.started¶
Summary: Total number of started pruning processes.
Description:
Type: meter
Qualification: Debug
daml.participant.api.services.read.get_connected_synchronizers¶
Summary: The time to execute a read service operation.
Description: The read service is an internal interface for reading the events from the synchronization interfaces. The metrics expose the time needed to execute each operation.
Type: timer
Qualification: Debug
daml.participant.api.services.read.get_lf_archive¶
Summary: The time to execute a read service operation.
Description: The read service is an internal interface for reading the events from the synchronization interfaces. The metrics expose the time needed to execute each operation.
Type: timer
Qualification: Debug
daml.participant.api.services.read.incomplete_reassignment_offsets¶
Summary: The time to execute a read service operation.
Description: The read service is an internal interface for reading the events from the synchronization interfaces. The metrics expose the time needed to execute each operation.
Type: timer
Qualification: Debug
daml.participant.api.services.read.list_lf_packages¶
Summary: The time to execute a read service operation.
Description: The read service is an internal interface for reading the events from the synchronization interfaces. The metrics expose the time needed to execute each operation.
Type: timer
Qualification: Debug
daml.participant.api.services.read.state_updates¶
Summary: The time to execute a read service operation.
Description: The read service is an internal interface for reading the events from the synchronization interfaces. The metrics expose the time needed to execute each operation.
Type: timer
Qualification: Debug
daml.participant.api.services.read.validate_dar¶
Summary: The time to execute a read service operation.
Description: The read service is an internal interface for reading the events from the synchronization interfaces. The metrics expose the time needed to execute each operation.
Type: timer
Qualification: Debug
daml.participant.api.services.transaction_trees¶
Summary: The time to execute an index service operation.
Description: The index service is an internal component responsible for access to the index db data. Its operations are invoked whenever a client request received over the ledger api requires access to the index db. This metric captures time statistics of such operations.
Type: timer
Qualification: Debug
daml.participant.api.services.transactions¶
Summary: The time to execute an index service operation.
Description: The index service is an internal component responsible for access to the index db data. Its operations are invoked whenever a client request received over the ledger api requires access to the index db. This metric captures time statistics of such operations.
Type: timer
Qualification: Debug
daml.participant.console.tx-node-count¶
Summary: Number of nodes per transaction histogram, measured using canton console ledger_api.updates.start_measure
Description:
Type: histogram
Qualification: Debug
daml.participant.console.tx-nodes-emitted¶
Summary: Total number of nodes emitted, measured using canton console ledger_api.updates.start_measure
Description:
Type: meter
Qualification: Debug
daml.participant.console.tx-size¶
Summary: Transaction size histogram, measured using canton console ledger_api.updates.start_measure
Description:
Type: histogram
Qualification: Debug
daml.participant.declarative_api.errors¶
Summary: Errors for the last update
Description: The node will attempt to apply the changes configured in the declarative config file. A positive number means that some items failed to be synchronised. A negative number means that the overall synchronisation procedure failed with an error. : 0 = everything good, -1 = config file unreadable, -2 = context could not be created, -3 = failure while applying items, -9 = exception caught.
Type: gauge
Qualification: Errors
daml.participant.declarative_api.items¶
Summary: Number of items managed through the declarative API
Description: This metric indicates the number of items managed through the declarative API
Type: gauge
Qualification: Debug
daml.participant.http_json_api.command_submission_ledger_timing¶
Summary:
Description:
Type: timer
Qualification: Debug
daml.participant.http_json_api.db_find_by_contract_id_timing¶
Summary:
Description:
Type: timer
Qualification: Debug
daml.participant.http_json_api.incoming_json_parsing_and_validation_timing¶
Summary:
Description:
Type: timer
Qualification: Debug
daml.participant.http_json_api.response_creation_timing¶
Summary:
Description:
Type: timer
Qualification: Debug
daml.participant.http_json_api.websocket_request_count¶
Summary:
Description:
Type: counter
Qualification: Debug
daml.participant.inflight_validation_requests*¶
Summary: Number of requests being validated.
Description: Number of requests that are currently being validated. This also covers requests submitted by other participants.
Type: gauge
Qualification: Saturation
- Labels:
participant: The id of the participant for which the value applies.
daml.participant.sync.commitments.catchup-mode-enabled¶
Summary: Measures how many times the commitment processor catch-up mode has been triggered.
Description: Participant nodes compute bilateral commitments at regular intervals. This metric exposes how often the catch-up mode has been activated. The catch-up mode is triggered according to catch-up config and happens if the participant lags behind on computation. A healthy value is 0. An increasing value indicates intermittent periods when a participant alternates between healthy and struggling to keep up with commitment computation. However, we do not see a constantly increasing value for a participant that is consistently behind commitment computation because, once catch-up mode is activated, the participant remains in catch-up mode until it has completely caught up, and only triggers the metric once. In order to troubleshoot non-zero values, the operator should cross-correlate this value with the daml.participant.sync.commitments.compute metric.
Type: meter
Qualification: Debug
daml.participant.sync.commitments.compute¶
Summary: Measures the time that the participant node spends computing commitments.
Description: Participant nodes compute bilateral commitments at regular intervals, i.e., reconciliation intervals. This metric exposes the time spent on each computation in milliseconds. There are two cases that the operator should pay attention to. First, fluctuations in this value are expected if the number of counter-participants or common stakeholder groups changes. However, changes with no apparent reason could indicate a bug and the operator should monitor closely. Second, it is a cause of concern if the value starts approaching or is greater than the reconciliation interval: The participant will perpetually lag behind, because it needs to compute commitments more frequently than it can manage. The operator should consider asking the synchronizer operator to increase the reconciliation interval if the increase in commitment computation is expected, or otherwise investigate the cause.
Type: timer
Qualification: Debug
daml.participant.sync.commitments.sequencing-time¶
Summary: Measures the time between the end of a commitment period, and the time when the sequencer observes the corresponding commitment.
Description: Participant nodes compute bilateral commitments at regular intervals. After a participant computes a commitment, it sends it for sequencing. The time between the end of a commitment interval and sequencing is measured in milliseconds. Because commitment computation is comprised within the measured time, the value is always greater than the daml.participant.sync.commitments.compute metric. The operator should pay attention to fluctuations of this value. An increase can be expected, e.g., because the computation time increases. However, a value increase can be a cause of concern, because it can indicate that the participant is lagging behind in processing messages and computing commitments, which is accompanied by ACS_COMMITMENT_DEGRADATION warnings in the participant logs. An increase can also indicate that the sequencer is slow in sequencing the commitment messages. The operator should cross-correlate with sequencing metrics such as daml.sequencer-client.submissions.sequencing and daml.sequencer-client.handler.delay. In this case, the operator should consider changing the preferred sequencer configuration.
Type: gauge
Qualification: Debug
daml.participant.sync.commitments.synchronizer.largest-counter-participant-latency¶
Summary: The highest latency in micros for commitments outstanding from counter-participants for more than a threshold-number of reconciliation intervals.
Description: Participant nodes compute bilateral commitments at regular intervals and send them. This metric is the default indicator of a counter-participant being slow.The metric exposes the highest latency of a counter-participant, measured by subtracting the highest known counter-participant latency from the most recent period processed by the participant. A counter-participant has to send a commitment at least once in order to appear here. The operator of a participant can configure a default threshold per synchronizer that the participant connects to. The smaller the threshold, the more sensitive the metric is to even small delays in receiving commitments from counter-participants. For example, for a threshold of 5 intervals and a reconciliation interval of 1 minute, the metric measures the latency of counter-participants that have sent no commitments for periods covering the last 5 minutes observed by the participant.
Type: gauge
Qualification: Debug
daml.participant.sync.commitments.synchronizer.largest-distinguished-counter-participant-latency¶
Summary: The highest latency in micros for commitments outstanding from distinguished counter-participants for more than a threshold-number of reconciliation intervals.
Description: Participant nodes compute bilateral commitments at regular intervals and send them. This metric indicates that a distinguished counter-participant is slow, i.e., the participant cannot confirm that its state is the same with that of a counter-participant with whom the operator has an important business relation.The metric exposes the highest latency of a counter-participant, measured by subtracting the highest known counter-participant latency from the most recent period processed by the participant. A counter-participant has to send a commitment at least once in order to appear here. The operator of a participant can configure a default threshold per synchronizer that the participant connects to. The smaller the threshold, the more sensitive the metric is to even small delays in receiving commitments from counter-participants. For example, for a threshold of 5 intervals and a reconciliation interval of 1 minute, the metric measures the latency of counter-participants that have sent no commitments for periods covering the last 5 minutes observed by the participant.
Type: gauge
Qualification: Debug
daml.participant.sync.conflict-detection.sequencer-counter-queue¶
Summary: Size of conflict detection sequencer counter queue
Description: The task scheduler will work off tasks according to the timestamp order, scheduling the tasks whenever a new timestamp has been observed. This metric exposes the number of un-processed sequencer messages that will trigger a timestamp advancement.
Type: counter
Qualification: Debug
daml.participant.sync.in-flight-submission-synchronizer-tracker.unsequenced-in-flight-submissions¶
Summary: Number of unsequenced submissions in-flight.
Description: Number of unsequenced submissions in-flight. Unsequenced in-flight submissions are tracked in-memory, so high amount here will boil down to memory pressure.
Type: gauge
Qualification: Saturation
daml.participant.sync.inflight-validations¶
Summary: Number of requests being validated on the synchronizer.
Description: Number of requests that are currently being validated on the synchronizer. This also covers requests submitted by other participants.
Type: counter
Qualification: Saturation
daml.participant.sync.protocol-messages.confirmation-request-creation¶
Summary: Time to create a transaction confirmation request
Description: The time that the transaction protocol processor needs to create a transaction confirmation request.
Type: timer
Qualification: Latency
daml.participant.sync.protocol-messages.confirmation-request-size¶
Summary: Confirmation request size
Description: Records the histogram of the sizes of (transaction) confirmation requests.
Type: histogram
Qualification: Debug
daml.participant.sync.protocol-messages.transaction-message-receipt¶
Summary: Time to parse and decrypt a transaction message
Description: The time that the transaction protocol processor needs to parse and decrypt an incoming confirmation request.
Type: timer
Qualification: Debug
daml.participant.sync.request-tracker.sequencer-counter-queue¶
Summary: Size of record order publisher sequencer counter queue
Description: Same as for conflict-detection, but measuring the sequencer counter queues for the publishing to the ledger api server according to record time.
Type: counter
Qualification: Debug
daml.pruning¶
Summary: Duration of prune operations.
Description: This timer exposes the duration of pruning requests from the Canton portion of the ledger.
Type: timer
Qualification: Saturation
daml.pruning.max-event-age¶
Summary: Age of oldest unpruned event.
Description: This gauge exposes the age of the oldest, unpruned event in hours as a way to quantify the pruning backlog.
Type: gauge
Qualification: Saturation
daml.sequencer-client.handler.actual-in-flight-event-batches¶
Summary: Nodes process the events from the synchronizer’s sequencer in batches. This metric tracks how many such batches are processed in parallel.
Description: Incoming messages are processed by a sequencer client, which combines them into batches of size up to ‘event-inbox-size’ before sending them to an application handler for processing. Depending on the system’s configuration, the rate at which event batches are sent to the handler may be throttled to avoid overwhelming it with too many events at once. Indicators that the configured upper bound may be too low: This metric constantly is closed to the configured maximum, which is exposed via ‘max-in-flight-event-batches’, while the system’s resources are under-utilized. Indicators that the configured upper bound may be too high: Out-of-memory errors crashing the JVM or frequent garbage collection cycles that slow down processing. The metric tracks how many of these batches have been sent to the application handler but have not yet been fully processed. This metric can help identify potential bottlenecks or issues with the application’s processing of events and provide insights into the overall workload of the system.
Type: counter
Qualification: Saturation
daml.sequencer-client.handler.application-handle¶
Summary: Timer monitoring time and rate of sequentially handling the event application logic
Description: All events are received sequentially. This handler records the rate and time it takes the application (participant or mediator) to handle the events.
Type: timer
Qualification: Debug
daml.sequencer-client.handler.delay¶
Summary: The delay on the event processing in milliseconds
Description: Every message received from the sequencer carries a timestamp that was assigned by the sequencer when it sequenced the message. This timestamp is called the sequencing timestamp. The component receiving the message on the participant or mediator is the sequencer client, while on the block sequencer itself, it’s the block update generator. Upon having received the same message from enough sequencers (as configured by the trust threshold), the sequencer client compares the time difference between the sequencing time and the computers local clock and exposes this difference as the given metric. The difference will include the clock-skew and the processing latency between assigning the timestamp on the sequencer and receiving the message by the recipient from enough sequencers. If the difference is large compared to the usual latencies, clock skew can be ruled out, and enough sequencers are not slow, then it means that the node is still trying to catch up with events that the sequencers sequenced a while ago. This can happen after having been offline for a while or if the node is too slow to keep up with the messaging load.
Type: gauge
Qualification: Debug
daml.sequencer-client.handler.max-in-flight-event-batches¶
Summary: Nodes process the events from the synchronizer’s sequencer in batches. This metric tracks the upper bound of such batches being processed in parallel.
Description: Incoming messages are processed by a sequencer client, which combines them into batches of size up to ‘event-inbox-size’ before sending them to an application handler for processing. Depending on the system’s configuration, the rate at which event batches are sent to the handler may be throttled to avoid overwhelming it with too many events at once. Configured by ‘maximum-in-flight-event-batches’ parameter in the sequencer-client config The metric shows the configured upper limit on how many batches the application handler may process concurrently. The metric ‘actual-in-flight-event-batches’ tracks the actual number of currently processed batches.
Type: gauge
Qualification: Debug
daml.sequencer-client.handler.sequencer-events¶
Summary: Number of received events from the sequencer
Description: A participant reads events from the sequencer. This metric captures the count and rate of events.
Type: counter
Qualification: Debug
daml.sequencer-client.submissions.dropped¶
Summary: Count of send requests that did not cause an event to be sequenced
Description: Counter of send requests we did not witness a corresponding event to be sequenced by the supplied max-sequencing-time. There could be many reasons for this happening: the request may have been lost before reaching the sequencer, the sequencer may be at capacity and the the max-sequencing-time was exceeded by the time the request was processed, or the supplied max-sequencing-time may just be too small for the sequencer to be able to sequence the request.
Type: counter
Qualification: Errors
daml.sequencer-client.submissions.in-flight¶
Summary: Number of sequencer send requests we have that are waiting for an outcome or timeout
Description: Incremented on every successful send to the sequencer. Decremented when the event or an error is sequenced, or when the max-sequencing-time has elapsed.
Type: counter
Qualification: Debug
daml.sequencer-client.submissions.overloaded¶
Summary: Count of send requests which receive an overloaded response
Description: Counter that is incremented if a send request receives an overloaded response from the sequencer.
Type: counter
Qualification: Errors
daml.sequencer-client.submissions.sends¶
Summary: Rate and timings of send requests to the sequencer
Description: Provides a rate and time of how long it takes for send requests to be accepted by the sequencer. Note that this is just for the request to be made and not for the requested event to actually be sequenced.
Type: timer
Qualification: Debug
daml.sequencer-client.submissions.sequencing¶
Summary: Rate and timings of sequencing requests
Description: This timer is started when a submission is made to the sequencer and then completed when a corresponding event is witnessed from the sequencer, so will encompass the entire duration for the sequencer to sequence the request. If the request does not result in an event no timing will be recorded.
Type: timer
Qualification: Latency
daml.sequencer-client.traffic-control.event-delivered¶
Summary: Number of events that were sequenced and delivered.
Description: Counter for event-delivered-cost.
Type: counter
Qualification: Traffic
daml.sequencer-client.traffic-control.event-delivered-cost¶
Summary: Cost of events that were sequenced and delivered.
Description: Cost of events for which the sender received confirmation that they were delivered. There is an exception for aggregated submissions: the cost of aggregate events will be recorded as soon as the event is ordered and the sequencer waits to receive threshold-many events. The final event may or may not be delivered successfully depending on the result of the aggregation.
Type: meter
Qualification: Traffic
daml.sequencer-client.traffic-control.event-rejected¶
Summary: Number of events that were sequenced but not delivered.
Description: Counter for event-rejected-cost.
Type: counter
Qualification: Traffic
daml.sequencer-client.traffic-control.event-rejected-cost¶
Summary: Cost of events that were sequenced but no delivered successfully.
Description: Cost of events for which the sender received confirmation that the events will not be delivered. The reason for non-delivery is labeled on the metric, if available.
Type: meter
Qualification: Traffic
daml.sequencer-client.traffic-control.submitted-event-cost¶
Summary: Cost of event submitted from the sequencer client.
Description: When the sequencer client sends an event to the sequencer to be sequenced, it will record on this metric the cost of the event. Note that the event may or may not end up being sequenced. So this metric may not exactly match the actual consumed traffic.
Type: meter
Qualification: Traffic
Sequencer Metrics¶
daml.cache.evicted_weight¶
Summary: The sum of weights of cache entries evicted.
Description: The total weight of the entries evicted from the cache.
Type: counter
Qualification: Debug
daml.cache.evictions¶
Summary: The number of the evicted cache entries.
Description: When an entry is evicted from the cache, the counter is incremented.
Type: counter
Qualification: Debug
daml.cache.hits¶
Summary: The number of cache hits.
Description: When a cache lookup encounters an existing cache entry, the counter is incremented.
Type: counter
Qualification: Debug
daml.cache.misses¶
Summary: The number of cache misses.
Description: When a cache lookup first encounters a missing cache entry, the counter is incremented.
Type: counter
Qualification: Debug
daml.db-storage.general.executor.exectime¶
Summary: Execution time metric for database tasks
Description: The time a task is running on the database is measured using this metric.
Type: timer
Qualification: Debug
daml.db-storage.general.executor.load¶
Summary: Load of database pool
Description: Database queries run as tasks on an async executor. This metric shows the current number of queries running in parallel divided by the number database connections for this database connection pool.
Type: gauge
Qualification: Saturation
daml.db-storage.general.executor.queued¶
Summary: Number of database access tasks waiting in queue
Description: Database access tasks get scheduled in this queue and get executed using one of the existing asynchronous sessions. A large queue indicates that the database connection is not able to deal with the large number of requests. Note that the queue has a maximum size. Tasks that do not fit into the queue will be retried, but won’t show up in this metric.
Type: counter
Qualification: Saturation
daml.db-storage.general.executor.running¶
Summary: Number of database access tasks currently running
Description: Database access tasks run on an async executor. This metric shows the current number of tasks running in parallel.
Type: gauge
Qualification: Debug
daml.db-storage.general.executor.waittime¶
Summary: Scheduling time metric for database tasks
Description: Every database query is scheduled using an asynchronous executor with a queue. The time a task is waiting in this queue is monitored using this metric.
Type: timer
Qualification: Debug
daml.db-storage.write.executor.exectime¶
Summary: Execution time metric for database tasks
Description: The time a task is running on the database is measured using this metric.
Type: timer
Qualification: Debug
daml.db-storage.write.executor.load¶
Summary: Load of database pool
Description: Database queries run as tasks on an async executor. This metric shows the current number of queries running in parallel divided by the number database connections for this database connection pool.
Type: gauge
Qualification: Saturation
daml.db-storage.write.executor.queued¶
Summary: Number of database access tasks waiting in queue
Description: Database access tasks get scheduled in this queue and get executed using one of the existing asynchronous sessions. A large queue indicates that the database connection is not able to deal with the large number of requests. Note that the queue has a maximum size. Tasks that do not fit into the queue will be retried, but won’t show up in this metric.
Type: counter
Qualification: Saturation
daml.db-storage.write.executor.running¶
Summary: Number of database access tasks currently running
Description: Database access tasks run on an async executor. This metric shows the current number of tasks running in parallel.
Type: gauge
Qualification: Debug
daml.db-storage.write.executor.waittime¶
Summary: Scheduling time metric for database tasks
Description: Every database query is scheduled using an asynchronous executor with a queue. The time a task is waiting in this queue is monitored using this metric.
Type: timer
Qualification: Debug
daml.grpc.server¶
Summary: Distribution of the durations of serving gRPC requests.
Description:
Type: timer
Qualification: Latency
daml.grpc.server.handled¶
Summary: Total number of handled gRPC requests.
Description:
Type: meter
Qualification: Traffic
daml.grpc.server.messages.received¶
Summary: Total number of gRPC messages received (on either type of connection).
Description:
Type: meter
Qualification: Traffic
daml.grpc.server.messages.received.bytes¶
Summary: Distribution of payload sizes in gRPC messages received (both unary and streaming).
Description:
Type: histogram
Qualification: Traffic
daml.grpc.server.messages.sent¶
Summary: Total number of gRPC messages sent (on either type of connection).
Description:
Type: meter
Qualification: Traffic
daml.grpc.server.messages.sent.bytes¶
Summary: Distribution of payload sizes in gRPC messages sent (both unary and streaming).
Description:
Type: histogram
Qualification: Traffic
daml.grpc.server.started¶
Summary: Total number of started gRPC requests (on either type of connection).
Description:
Type: meter
Qualification: Traffic
daml.sequencer-client.handler.actual-in-flight-event-batches¶
Summary: Nodes process the events from the synchronizer’s sequencer in batches. This metric tracks how many such batches are processed in parallel.
Description: Incoming messages are processed by a sequencer client, which combines them into batches of size up to ‘event-inbox-size’ before sending them to an application handler for processing. Depending on the system’s configuration, the rate at which event batches are sent to the handler may be throttled to avoid overwhelming it with too many events at once. Indicators that the configured upper bound may be too low: This metric constantly is closed to the configured maximum, which is exposed via ‘max-in-flight-event-batches’, while the system’s resources are under-utilized. Indicators that the configured upper bound may be too high: Out-of-memory errors crashing the JVM or frequent garbage collection cycles that slow down processing. The metric tracks how many of these batches have been sent to the application handler but have not yet been fully processed. This metric can help identify potential bottlenecks or issues with the application’s processing of events and provide insights into the overall workload of the system.
Type: counter
Qualification: Saturation
daml.sequencer-client.handler.application-handle¶
Summary: Timer monitoring time and rate of sequentially handling the event application logic
Description: All events are received sequentially. This handler records the rate and time it takes the application (participant or mediator) to handle the events.
Type: timer
Qualification: Debug
daml.sequencer-client.handler.delay¶
Summary: The delay on the event processing in milliseconds
Description: Every message received from the sequencer carries a timestamp that was assigned by the sequencer when it sequenced the message. This timestamp is called the sequencing timestamp. The component receiving the message on the participant or mediator is the sequencer client, while on the block sequencer itself, it’s the block update generator. Upon having received the same message from enough sequencers (as configured by the trust threshold), the sequencer client compares the time difference between the sequencing time and the computers local clock and exposes this difference as the given metric. The difference will include the clock-skew and the processing latency between assigning the timestamp on the sequencer and receiving the message by the recipient from enough sequencers. If the difference is large compared to the usual latencies, clock skew can be ruled out, and enough sequencers are not slow, then it means that the node is still trying to catch up with events that the sequencers sequenced a while ago. This can happen after having been offline for a while or if the node is too slow to keep up with the messaging load.
Type: gauge
Qualification: Debug
daml.sequencer-client.handler.max-in-flight-event-batches¶
Summary: Nodes process the events from the synchronizer’s sequencer in batches. This metric tracks the upper bound of such batches being processed in parallel.
Description: Incoming messages are processed by a sequencer client, which combines them into batches of size up to ‘event-inbox-size’ before sending them to an application handler for processing. Depending on the system’s configuration, the rate at which event batches are sent to the handler may be throttled to avoid overwhelming it with too many events at once. Configured by ‘maximum-in-flight-event-batches’ parameter in the sequencer-client config The metric shows the configured upper limit on how many batches the application handler may process concurrently. The metric ‘actual-in-flight-event-batches’ tracks the actual number of currently processed batches.
Type: gauge
Qualification: Debug
daml.sequencer-client.handler.sequencer-events¶
Summary: Number of received events from the sequencer
Description: A participant reads events from the sequencer. This metric captures the count and rate of events.
Type: counter
Qualification: Debug
daml.sequencer-client.submissions.dropped¶
Summary: Count of send requests that did not cause an event to be sequenced
Description: Counter of send requests we did not witness a corresponding event to be sequenced by the supplied max-sequencing-time. There could be many reasons for this happening: the request may have been lost before reaching the sequencer, the sequencer may be at capacity and the the max-sequencing-time was exceeded by the time the request was processed, or the supplied max-sequencing-time may just be too small for the sequencer to be able to sequence the request.
Type: counter
Qualification: Errors
daml.sequencer-client.submissions.in-flight¶
Summary: Number of sequencer send requests we have that are waiting for an outcome or timeout
Description: Incremented on every successful send to the sequencer. Decremented when the event or an error is sequenced, or when the max-sequencing-time has elapsed.
Type: counter
Qualification: Debug
daml.sequencer-client.submissions.overloaded¶
Summary: Count of send requests which receive an overloaded response
Description: Counter that is incremented if a send request receives an overloaded response from the sequencer.
Type: counter
Qualification: Errors
daml.sequencer-client.submissions.sends¶
Summary: Rate and timings of send requests to the sequencer
Description: Provides a rate and time of how long it takes for send requests to be accepted by the sequencer. Note that this is just for the request to be made and not for the requested event to actually be sequenced.
Type: timer
Qualification: Debug
daml.sequencer-client.submissions.sequencing¶
Summary: Rate and timings of sequencing requests
Description: This timer is started when a submission is made to the sequencer and then completed when a corresponding event is witnessed from the sequencer, so will encompass the entire duration for the sequencer to sequence the request. If the request does not result in an event no timing will be recorded.
Type: timer
Qualification: Latency
daml.sequencer-client.traffic-control.event-delivered¶
Summary: Number of events that were sequenced and delivered.
Description: Counter for event-delivered-cost.
Type: counter
Qualification: Traffic
daml.sequencer-client.traffic-control.event-delivered-cost¶
Summary: Cost of events that were sequenced and delivered.
Description: Cost of events for which the sender received confirmation that they were delivered. There is an exception for aggregated submissions: the cost of aggregate events will be recorded as soon as the event is ordered and the sequencer waits to receive threshold-many events. The final event may or may not be delivered successfully depending on the result of the aggregation.
Type: meter
Qualification: Traffic
daml.sequencer-client.traffic-control.event-rejected¶
Summary: Number of events that were sequenced but not delivered.
Description: Counter for event-rejected-cost.
Type: counter
Qualification: Traffic
daml.sequencer-client.traffic-control.event-rejected-cost¶
Summary: Cost of events that were sequenced but no delivered successfully.
Description: Cost of events for which the sender received confirmation that the events will not be delivered. The reason for non-delivery is labeled on the metric, if available.
Type: meter
Qualification: Traffic
daml.sequencer-client.traffic-control.submitted-event-cost¶
Summary: Cost of event submitted from the sequencer client.
Description: When the sequencer client sends an event to the sequencer to be sequenced, it will record on this metric the cost of the event. Note that the event may or may not end up being sequenced. So this metric may not exactly match the actual consumed traffic.
Type: meter
Qualification: Traffic
daml.sequencer.bftordering.consensus.commit-latency¶
Summary: Consensus commit latency
Description: Records the rate and latency it takes to commit a block at the consensus level.
Type: timer
Qualification: Latency
daml.sequencer.bftordering.consensus.discarded-messages¶
Summary: Discarded messages
Description: Discarded network messages received during an epoch, either due to being repeated (too many retransmissions), invalid or from a stale view
Type: meter
Qualification: Traffic
daml.sequencer.bftordering.consensus.discarded-rate-limited-retransmission-requests¶
Summary: Discarded rate limited retransmission requests
Description: Discarded retransmission requests messages due to rate limiting
Type: meter
Qualification: Traffic
daml.sequencer.bftordering.consensus.discarded-wrong-epoch-retransmission-responses¶
Summary: Discarded retransmission response messages
Description: Discarded retransmission response messages for epoch different than current one
Type: meter
Qualification: Traffic
daml.sequencer.bftordering.consensus.epoch¶
Summary: Epoch number
Description: Current epoch number for the node.
Type: gauge
Qualification: Traffic
daml.sequencer.bftordering.consensus.epoch-length¶
Summary: Epoch length
Description: Length of the current epoch in number of blocks.
Type: gauge
Qualification: Traffic
daml.sequencer.bftordering.consensus.epoch-view-changes¶
Summary: Number of view changes occurred
Description: Number of view changes occurred.
Type: gauge
Qualification: Latency
daml.sequencer.bftordering.consensus.incoming-retransmission-requests¶
Summary: Incoming retransmissions requests
Description: Retransmissions requests received during an epoch
Type: meter
Qualification: Traffic
daml.sequencer.bftordering.consensus.outgoing-retransmission-requests¶
Summary: Outgoing retransmissions requests
Description: Retransmissions requests sent during an epoch
Type: meter
Qualification: Traffic
daml.sequencer.bftordering.consensus.postponed-view-messages-dropped¶
Summary: Count of messages dropped by queue containing postponed view messages
Description: Count of messages dropped by queue containing postponed view messages.
Type: meter
Qualification: Saturation
daml.sequencer.bftordering.consensus.postponed-view-messages-duplicates¶
Summary: Count of messages dropped as duplicates by queue containing postponed view messages
Description: Count of messages dropped as duplicates by queue containing postponed view messages.
Type: meter
Qualification: Saturation
daml.sequencer.bftordering.consensus.postponed-view-messages-queue-max-size¶
Summary: Actual maximum size of the queue containing postponed view messages
Description: Actual maximum size of the queue containing postponed view messages.
Type: gauge
Qualification: Saturation
daml.sequencer.bftordering.consensus.postponed-view-messages-queue-size¶
Summary: Size of the queue containing postponed view messages
Description: Size of the queue containing postponed view messages.
Type: gauge
Qualification: Saturation
daml.sequencer.bftordering.consensus.retransmitted-commit-certificates¶
Summary: Retransmitted commit certificates
Description: Number of commit certificates retransmitted during an epoch
Type: meter
Qualification: Traffic
daml.sequencer.bftordering.consensus.retransmitted-messages¶
Summary: Retransmitted PBFT messages
Description: Number of PBFT messages retransmitted during an epoch
Type: meter
Qualification: Traffic
daml.sequencer.bftordering.consensus.state-transfer.postponed-consensus-messages-dropped¶
Summary: Count of messages dropped by queue containing consensus messages postponed during state transfer
Description: Count of messages dropped by queue containing consensus messages postponed during state transfer.
Type: meter
Qualification: Saturation
daml.sequencer.bftordering.consensus.state-transfer.postponed-consensus-messages-queue-max-size¶
Summary: Actual maximum size of the queue containing consensus messages postponed during state transfer
Description: Actual maximum size of the queue containing consensus messages postponed during state transfer.
Type: gauge
Qualification: Saturation
daml.sequencer.bftordering.consensus.state-transfer.postponed-consensus-messages-queue-size¶
Summary: Size of the queue containing consensus messages postponed during state transfer
Description: Size of the queue containing consensus messages postponed during state transfer.
Type: gauge
Qualification: Saturation
daml.sequencer.bftordering.declarative_api.errors¶
Summary: Errors for the last update
Description: The node will attempt to apply the changes configured in the declarative config file. A positive number means that some items failed to be synchronised. A negative number means that the overall synchronisation procedure failed with an error. : 0 = everything good, -1 = config file unreadable, -2 = context could not be created, -3 = failure while applying items, -9 = exception caught.
Type: gauge
Qualification: Errors
daml.sequencer.bftordering.declarative_api.items¶
Summary: Number of items managed through the declarative API
Description: This metric indicates the number of items managed through the declarative API
Type: gauge
Qualification: Debug
daml.sequencer.bftordering.global.ordered-blocks¶
Summary: Blocks ordered
Description: Measures the total blocks ordered.
Type: meter
Qualification: Traffic
daml.sequencer.bftordering.global.requests-ordering-latency¶
Summary: Requests ordering latency
Description: Records the rate and latency it takes to order requests. This metric is always meaningful when queried on and restricted to the receiving sequencer; in other cases, it is meaningful only when the receiving and reporting sequencers’ clocks are kept synchronized.
Type: timer
Qualification: Latency
daml.sequencer.bftordering.ingress.bytes-queued¶
Summary: Bytes queued
Description: Measures the size of the mempool in bytes.
Type: gauge
Qualification: Saturation
daml.sequencer.bftordering.ingress.received-bytes¶
Summary: Bytes received
Description: Measures the total bytes received.
Type: meter
Qualification: Traffic
daml.sequencer.bftordering.ingress.received-requests¶
Summary: Requests received
Description: Measures the total requests received.
Type: meter
Qualification: Traffic
daml.sequencer.bftordering.ingress.requests-queued¶
Summary: Requests queued
Description: Measures the size of the mempool in requests.
Type: gauge
Qualification: Saturation
daml.sequencer.bftordering.ingress.requests-size¶
Summary: Requests size
Description: Records the size of requests to the BFT ordering service.
Type: histogram
Qualification: Traffic
daml.sequencer.bftordering.mempool.requested-batches¶
Summary: Requested batches
Description: Number of batches requested from the mempool by the availability module.
Type: gauge
Qualification: Saturation
daml.sequencer.bftordering.output.block-delay¶
Summary: Block delay
Description: Wall-clock time of the ordered block being provided to the sequencer minus BFT time of the block.
Type: timer
Qualification: Latency
daml.sequencer.bftordering.output.block-size-batches¶
Summary: Block size (batches)
Description: Records the size (in batches) of blocks ordered.
Type: histogram
Qualification: Traffic
daml.sequencer.bftordering.output.block-size-bytes¶
Summary: Block size (bytes)
Description: Records the size (in bytes) of blocks ordered.
Type: histogram
Qualification: Traffic
daml.sequencer.bftordering.output.block-size-requests¶
Summary: Block size (requests)
Description: Records the size (in requests) of blocks ordered.
Type: histogram
Qualification: Traffic
daml.sequencer.bftordering.p2p.connections.authenticated¶
Summary: Authenticated peers
Description: Number of connected P2P endpoints that are also authenticated.
Type: gauge
Qualification: Traffic
daml.sequencer.bftordering.p2p.connections.connected¶
Summary: Connected peers
Description: Number of connected P2P endpoints.
Type: gauge
Qualification: Traffic
daml.sequencer.bftordering.p2p.receive.processing-latency¶
Summary: Message receive processing latency
Description: Records the rate and latency when processing incoming P2P network messages.
Type: timer
Qualification: Latency
daml.sequencer.bftordering.p2p.receive.received-bytes¶
Summary: Bytes received
Description: Total P2P bytes received.
Type: meter
Qualification: Traffic
daml.sequencer.bftordering.p2p.receive.received-messages¶
Summary: Messages received
Description: Total P2P messages received.
Type: meter
Qualification: Traffic
daml.sequencer.bftordering.p2p.send.grpc-latency¶
Summary: Latency of a gRPC message send
Description: Records the rate of gRPC message sends and their latency (up to receiving them on the other side).
Type: timer
Qualification: Latency
daml.sequencer.bftordering.p2p.send.network-write-latency¶
Summary: Message network write latency
Description: Records the rate and latency when writing P2P messages to the network.
Type: timer
Qualification: Latency
daml.sequencer.bftordering.p2p.send.sends-retried¶
Summary: P2P sends retried
Description: Total P2P network sends retried after a delay due to missing connectivity.
Type: counter
Qualification: Latency
daml.sequencer.bftordering.p2p.send.sent-bytes¶
Summary: Bytes sent
Description: Total P2P bytes sent.
Type: meter
Qualification: Traffic
daml.sequencer.bftordering.p2p.send.sent-messages¶
Summary: Messages sent
Description: Total P2P messages sent.
Type: meter
Qualification: Traffic
daml.sequencer.bftordering.performance.ordering-stage-latency¶
Summary: Ordering stage latency
Description: Records the rate and latency it takes for an ordering stage, which is recorded as a label. This metric is meaningful only when sequencers’ clocks are kept synchronized.
Type: timer
Qualification: Latency
daml.sequencer.bftordering.topology.query-latency¶
Summary: Topology query latency
Description: Records the rate and latency when querying the topology client.
Type: timer
Qualification: Latency
daml.sequencer.bftordering.topology.validators¶
Summary: Active validators
Description: Number of BFT sequencers actively involved in consensus.
Type: gauge
Qualification: Traffic
daml.sequencer.block.acknowledgments_micros*¶
Summary: Acknowledgments by members in Micros
Description:
Type: gauge
Qualification: Latency
- Labels:
member: The sender of the acknowledgment
daml.sequencer.block.delay¶
Summary: The block processing delay in milliseconds, relative to wall clock
Description: Every block carries a timestamp that was assigned by the ordering service when it ordered the block. This metric shows the difference between the wall clock of the sequencer node and the timestamp of the last processed block. The difference will include the clock-skew and the processing latency of the ordering service. If the delay is large compared to the usual latencies, clock skew can be ruled out, and enough sequencers are not slow, then it means that the node is still trying to catch up reading blocks from the ordering service. This can happen after having been offline for a while or if the node is too slow to keep up with the block processing load.
Type: gauge
Qualification: Latency
daml.sequencer.block.event-bytes*¶
Summary: Event bytes processed by the sequencer, tagged by type.
Description: Similar to events, except measured by bytes
Type: meter
Qualification: Traffic
- Labels:
member: The sender of the submission request
type: Type of request
daml.sequencer.block.events*¶
Summary: Events processed by the sequencer, tagged by type.
Description: The sequencer forwards opaque, possibly encrypted payload. However, by looking at the recipient list, the type of message can still be inferred, and tagged appropriately, including the sender.
Type: meter
Qualification: Traffic
- Labels:
member: The sender of the submission request
type: Type of request
daml.sequencer.block.height¶
Summary: Current block height processed
Description: The submission messages are processed in blocks, where each block has an increasing number. The metric shows the height of the last processed block by the given sequencer node.
Type: gauge
Qualification: Traffic
daml.sequencer.db-storage.general.executor.exectime¶
Summary: Execution time metric for database tasks
Description: The time a task is running on the database is measured using this metric.
Type: timer
Qualification: Debug
daml.sequencer.db-storage.general.executor.load¶
Summary: Load of database pool
Description: Database queries run as tasks on an async executor. This metric shows the current number of queries running in parallel divided by the number database connections for this database connection pool.
Type: gauge
Qualification: Saturation
daml.sequencer.db-storage.general.executor.queued¶
Summary: Number of database access tasks waiting in queue
Description: Database access tasks get scheduled in this queue and get executed using one of the existing asynchronous sessions. A large queue indicates that the database connection is not able to deal with the large number of requests. Note that the queue has a maximum size. Tasks that do not fit into the queue will be retried, but won’t show up in this metric.
Type: counter
Qualification: Saturation
daml.sequencer.db-storage.general.executor.running¶
Summary: Number of database access tasks currently running
Description: Database access tasks run on an async executor. This metric shows the current number of tasks running in parallel.
Type: gauge
Qualification: Debug
daml.sequencer.db-storage.general.executor.waittime¶
Summary: Scheduling time metric for database tasks
Description: Every database query is scheduled using an asynchronous executor with a queue. The time a task is waiting in this queue is monitored using this metric.
Type: timer
Qualification: Debug
daml.sequencer.db-storage.write.executor.exectime¶
Summary: Execution time metric for database tasks
Description: The time a task is running on the database is measured using this metric.
Type: timer
Qualification: Debug
daml.sequencer.db-storage.write.executor.load¶
Summary: Load of database pool
Description: Database queries run as tasks on an async executor. This metric shows the current number of queries running in parallel divided by the number database connections for this database connection pool.
Type: gauge
Qualification: Saturation
daml.sequencer.db-storage.write.executor.queued¶
Summary: Number of database access tasks waiting in queue
Description: Database access tasks get scheduled in this queue and get executed using one of the existing asynchronous sessions. A large queue indicates that the database connection is not able to deal with the large number of requests. Note that the queue has a maximum size. Tasks that do not fit into the queue will be retried, but won’t show up in this metric.
Type: counter
Qualification: Saturation
daml.sequencer.db-storage.write.executor.running¶
Summary: Number of database access tasks currently running
Description: Database access tasks run on an async executor. This metric shows the current number of tasks running in parallel.
Type: gauge
Qualification: Debug
daml.sequencer.db-storage.write.executor.waittime¶
Summary: Scheduling time metric for database tasks
Description: Every database query is scheduled using an asynchronous executor with a queue. The time a task is waiting in this queue is monitored using this metric.
Type: timer
Qualification: Debug
daml.sequencer.db.watermark_delay¶
Summary: The event processing delay in milliseconds, relative to wall clock
Description: Sequencer writes events in parallel using a watermark. This metric shows the difference between the wall clock of the sequencer node and the current watermark of the last written events. The difference will include the clock-skew and the processing latency of the sequencer database write. For block sequencers if the delay is large compared to the usual latencies, clock skew can be ruled out, and enough sequencers are not slow, then it means that the node is still trying to catch up reading blocks from the ordering service. This can happen after having been offline for a while or if the node is too slow to keep up with the block processing load. For database sequencers it means that database system is not being able to keep up with the write load.
Type: gauge
Qualification: Latency
daml.sequencer.declarative_api.errors¶
Summary: Errors for the last update
Description: The node will attempt to apply the changes configured in the declarative config file. A positive number means that some items failed to be synchronised. A negative number means that the overall synchronisation procedure failed with an error. : 0 = everything good, -1 = config file unreadable, -2 = context could not be created, -3 = failure while applying items, -9 = exception caught.
Type: gauge
Qualification: Errors
daml.sequencer.declarative_api.items¶
Summary: Number of items managed through the declarative API
Description: This metric indicates the number of items managed through the declarative API
Type: gauge
Qualification: Debug
daml.sequencer.max-event-age¶
Summary: Age of oldest unpruned sequencer event.
Description: This gauge exposes the age of the oldest, unpruned sequencer event in hours as a way to quantify the pruning backlog.
Type: gauge
Qualification: Debug
daml.sequencer.public-api.processed¶
Summary: Number of messages processed by the sequencer
Description: This metric measures the number of successfully validated messages processed by the sequencer since the start of this process.
Type: meter
Qualification: Traffic
daml.sequencer.public-api.processed-bytes¶
Summary: Number of message bytes processed by the sequencer
Description: This metric measures the total number of message bytes processed by the sequencer. If the message received by the sequencer contains duplicate or irrelevant fields, the contents of these fields do not contribute to this metric.
Type: meter
Qualification: Traffic
daml.sequencer.public-api.subscriptions¶
Summary: Number of active sequencer subscriptions
Description: This metric indicates the number of active subscriptions currently open and actively served subscriptions at the sequencer.
Type: gauge
Qualification: Traffic
daml.sequencer.public-api.time-requests¶
Summary: Number of time requests received by the sequencer
Description: When a Participant needs to know the synchronizer time it will make a request for a time proof to be sequenced. It would be normal to see a small number of these being sequenced, however if this number becomes a significant portion of the total requests to the sequencer it could indicate that the strategy for requesting times may need to be revised to deal with different clock skews and latencies between the sequencer and participants.
Type: meter
Qualification: Debug
daml.sequencer.traffic-control.balance-cache-miss-for-timestamp¶
Summary: Counts cache misses when trying to retrieve a balance for a given timestamp.
Description: The per member cache only keeps in memory a subset of all the non-pruned balance updates persisted in the database. If the cache contains some balances for a member but not the one requested, a DB call will be made to try to retrieve it. When that happens, this metric is incremented. If this occurs too frequently, consider increasing the config value of trafficPurchasedCacheSizePerMember.
Type: counter
Qualification: Debug
daml.sequencer.traffic-control.balance-update¶
Summary: Counts balance updates fully processed by the sequencer.
Description: Value of balance updates for all (aggregated).
Type: counter
Qualification: Traffic
daml.sequencer.traffic-control.event-delivered¶
Summary: Number of events that were sequenced and delivered.
Description: Counter for event-delivered-cost.
Type: counter
Qualification: Traffic
daml.sequencer.traffic-control.event-delivered-cost¶
Summary: Cost of events that were sequenced and delivered.
Description: Cost of events for which the sender received confirmation that they were delivered. There is an exception for aggregated submissions: the cost of aggregate events will be recorded as soon as the event is ordered and the sequencer waits to receive threshold-many events. The final event may or may not be delivered successfully depending on the result of the aggregation.
Type: meter
Qualification: Traffic
daml.sequencer.traffic-control.event-rejected¶
Summary: Number of events that were sequenced but not delivered.
Description: Counter for event-rejected-cost.
Type: counter
Qualification: Traffic
daml.sequencer.traffic-control.event-rejected-cost¶
Summary: Cost of events that were sequenced but no delivered successfully.
Description: Cost of events for which the sender received confirmation that the events will not be delivered. The reason for non-delivery is labeled on the metric, if available.
Type: meter
Qualification: Traffic
daml.sequencer.traffic-control.submitted-event-cost¶
Summary: Cost of event submitted from the sequencer client.
Description: When the sequencer client sends an event to the sequencer to be sequenced, it will record on this metric the cost of the event. Note that the event may or may not end up being sequenced. So this metric may not exactly match the actual consumed traffic.
Type: meter
Qualification: Traffic
daml.sequencer.traffic-control.wasted-sequencing¶
Summary: Byte size of events that got sequenced but failed to pass validation steps after sequencing
Description: Record the raw byte size of events that are ordered but were not delivered because of traffic enforcement.
Type: meter
Qualification: Traffic
daml.sequencer.traffic-control.wasted-sequencing-counter¶
Summary: Number of events that failed traffic validation and were not delivered because of it.
Description: Counter for wasted-sequencing.
Type: counter
Qualification: Traffic
daml.sequencer.traffic-control.wasted-traffic¶
Summary: Cost of event that was deducted but not delivered.
Description: Events can have their cost deducted but still not be delivered due to other failed validation after ordering. This metrics records the traffic cost of such events.
Type: meter
Qualification: Traffic
daml.sequencer.traffic-control.wasted-traffic-counter¶
Summary: Number of events that cost traffic but were not delivered.
Description: Counter for wasted-traffic.
Type: counter
Qualification: Traffic
Mediator Metrics¶
daml.db-storage.general.executor.exectime¶
Summary: Execution time metric for database tasks
Description: The time a task is running on the database is measured using this metric.
Type: timer
Qualification: Debug
daml.db-storage.general.executor.load¶
Summary: Load of database pool
Description: Database queries run as tasks on an async executor. This metric shows the current number of queries running in parallel divided by the number database connections for this database connection pool.
Type: gauge
Qualification: Saturation
daml.db-storage.general.executor.queued¶
Summary: Number of database access tasks waiting in queue
Description: Database access tasks get scheduled in this queue and get executed using one of the existing asynchronous sessions. A large queue indicates that the database connection is not able to deal with the large number of requests. Note that the queue has a maximum size. Tasks that do not fit into the queue will be retried, but won’t show up in this metric.
Type: counter
Qualification: Saturation
daml.db-storage.general.executor.running¶
Summary: Number of database access tasks currently running
Description: Database access tasks run on an async executor. This metric shows the current number of tasks running in parallel.
Type: gauge
Qualification: Debug
daml.db-storage.general.executor.waittime¶
Summary: Scheduling time metric for database tasks
Description: Every database query is scheduled using an asynchronous executor with a queue. The time a task is waiting in this queue is monitored using this metric.
Type: timer
Qualification: Debug
daml.db-storage.write.executor.exectime¶
Summary: Execution time metric for database tasks
Description: The time a task is running on the database is measured using this metric.
Type: timer
Qualification: Debug
daml.db-storage.write.executor.load¶
Summary: Load of database pool
Description: Database queries run as tasks on an async executor. This metric shows the current number of queries running in parallel divided by the number database connections for this database connection pool.
Type: gauge
Qualification: Saturation
daml.db-storage.write.executor.queued¶
Summary: Number of database access tasks waiting in queue
Description: Database access tasks get scheduled in this queue and get executed using one of the existing asynchronous sessions. A large queue indicates that the database connection is not able to deal with the large number of requests. Note that the queue has a maximum size. Tasks that do not fit into the queue will be retried, but won’t show up in this metric.
Type: counter
Qualification: Saturation
daml.db-storage.write.executor.running¶
Summary: Number of database access tasks currently running
Description: Database access tasks run on an async executor. This metric shows the current number of tasks running in parallel.
Type: gauge
Qualification: Debug
daml.db-storage.write.executor.waittime¶
Summary: Scheduling time metric for database tasks
Description: Every database query is scheduled using an asynchronous executor with a queue. The time a task is waiting in this queue is monitored using this metric.
Type: timer
Qualification: Debug
daml.grpc.server¶
Summary: Distribution of the durations of serving gRPC requests.
Description:
Type: timer
Qualification: Latency
daml.grpc.server.handled¶
Summary: Total number of handled gRPC requests.
Description:
Type: meter
Qualification: Traffic
daml.grpc.server.messages.received¶
Summary: Total number of gRPC messages received (on either type of connection).
Description:
Type: meter
Qualification: Traffic
daml.grpc.server.messages.received.bytes¶
Summary: Distribution of payload sizes in gRPC messages received (both unary and streaming).
Description:
Type: histogram
Qualification: Traffic
daml.grpc.server.messages.sent¶
Summary: Total number of gRPC messages sent (on either type of connection).
Description:
Type: meter
Qualification: Traffic
daml.grpc.server.messages.sent.bytes¶
Summary: Distribution of payload sizes in gRPC messages sent (both unary and streaming).
Description:
Type: histogram
Qualification: Traffic
daml.grpc.server.started¶
Summary: Total number of started gRPC requests (on either type of connection).
Description:
Type: meter
Qualification: Traffic
daml.mediator.approved-requests¶
Summary: Total number of approved confirmation requests
Description: This metric provides the total number of approved confirmation requests since the system has been started. A confirmation request is approved if all the required confirmations are received by the mediator within the decision time.
Type: meter
Qualification: Debug
daml.mediator.declarative_api.errors¶
Summary: Errors for the last update
Description: The node will attempt to apply the changes configured in the declarative config file. A positive number means that some items failed to be synchronised. A negative number means that the overall synchronisation procedure failed with an error. : 0 = everything good, -1 = config file unreadable, -2 = context could not be created, -3 = failure while applying items, -9 = exception caught.
Type: gauge
Qualification: Errors
daml.mediator.declarative_api.items¶
Summary: Number of items managed through the declarative API
Description: This metric indicates the number of items managed through the declarative API
Type: gauge
Qualification: Debug
daml.mediator.max-event-age¶
Summary: Age of oldest unpruned confirmation response.
Description: This gauge exposes the age of the oldest, unpruned confirmation response in hours as a way to quantify the pruning backlog.
Type: gauge
Qualification: Debug
daml.mediator.outstanding-requests¶
Summary: Number of currently outstanding requests
Description: This metric provides the number of currently open requests registered with the mediator.
Type: gauge
Qualification: Debug
daml.mediator.requests¶
Summary: Total number of processed confirmation requests (approved and rejected)
Description: This metric provides the number of processed confirmation requests since the system has been started.
Type: meter
Qualification: Debug
daml.sequencer-client.handler.actual-in-flight-event-batches¶
Summary: Nodes process the events from the synchronizer’s sequencer in batches. This metric tracks how many such batches are processed in parallel.
Description: Incoming messages are processed by a sequencer client, which combines them into batches of size up to ‘event-inbox-size’ before sending them to an application handler for processing. Depending on the system’s configuration, the rate at which event batches are sent to the handler may be throttled to avoid overwhelming it with too many events at once. Indicators that the configured upper bound may be too low: This metric constantly is closed to the configured maximum, which is exposed via ‘max-in-flight-event-batches’, while the system’s resources are under-utilized. Indicators that the configured upper bound may be too high: Out-of-memory errors crashing the JVM or frequent garbage collection cycles that slow down processing. The metric tracks how many of these batches have been sent to the application handler but have not yet been fully processed. This metric can help identify potential bottlenecks or issues with the application’s processing of events and provide insights into the overall workload of the system.
Type: counter
Qualification: Saturation
daml.sequencer-client.handler.application-handle¶
Summary: Timer monitoring time and rate of sequentially handling the event application logic
Description: All events are received sequentially. This handler records the rate and time it takes the application (participant or mediator) to handle the events.
Type: timer
Qualification: Debug
daml.sequencer-client.handler.delay¶
Summary: The delay on the event processing in milliseconds
Description: Every message received from the sequencer carries a timestamp that was assigned by the sequencer when it sequenced the message. This timestamp is called the sequencing timestamp. The component receiving the message on the participant or mediator is the sequencer client, while on the block sequencer itself, it’s the block update generator. Upon having received the same message from enough sequencers (as configured by the trust threshold), the sequencer client compares the time difference between the sequencing time and the computers local clock and exposes this difference as the given metric. The difference will include the clock-skew and the processing latency between assigning the timestamp on the sequencer and receiving the message by the recipient from enough sequencers. If the difference is large compared to the usual latencies, clock skew can be ruled out, and enough sequencers are not slow, then it means that the node is still trying to catch up with events that the sequencers sequenced a while ago. This can happen after having been offline for a while or if the node is too slow to keep up with the messaging load.
Type: gauge
Qualification: Debug
daml.sequencer-client.handler.max-in-flight-event-batches¶
Summary: Nodes process the events from the synchronizer’s sequencer in batches. This metric tracks the upper bound of such batches being processed in parallel.
Description: Incoming messages are processed by a sequencer client, which combines them into batches of size up to ‘event-inbox-size’ before sending them to an application handler for processing. Depending on the system’s configuration, the rate at which event batches are sent to the handler may be throttled to avoid overwhelming it with too many events at once. Configured by ‘maximum-in-flight-event-batches’ parameter in the sequencer-client config The metric shows the configured upper limit on how many batches the application handler may process concurrently. The metric ‘actual-in-flight-event-batches’ tracks the actual number of currently processed batches.
Type: gauge
Qualification: Debug
daml.sequencer-client.handler.sequencer-events¶
Summary: Number of received events from the sequencer
Description: A participant reads events from the sequencer. This metric captures the count and rate of events.
Type: counter
Qualification: Debug
daml.sequencer-client.submissions.dropped¶
Summary: Count of send requests that did not cause an event to be sequenced
Description: Counter of send requests we did not witness a corresponding event to be sequenced by the supplied max-sequencing-time. There could be many reasons for this happening: the request may have been lost before reaching the sequencer, the sequencer may be at capacity and the the max-sequencing-time was exceeded by the time the request was processed, or the supplied max-sequencing-time may just be too small for the sequencer to be able to sequence the request.
Type: counter
Qualification: Errors
daml.sequencer-client.submissions.in-flight¶
Summary: Number of sequencer send requests we have that are waiting for an outcome or timeout
Description: Incremented on every successful send to the sequencer. Decremented when the event or an error is sequenced, or when the max-sequencing-time has elapsed.
Type: counter
Qualification: Debug
daml.sequencer-client.submissions.overloaded¶
Summary: Count of send requests which receive an overloaded response
Description: Counter that is incremented if a send request receives an overloaded response from the sequencer.
Type: counter
Qualification: Errors
daml.sequencer-client.submissions.sends¶
Summary: Rate and timings of send requests to the sequencer
Description: Provides a rate and time of how long it takes for send requests to be accepted by the sequencer. Note that this is just for the request to be made and not for the requested event to actually be sequenced.
Type: timer
Qualification: Debug
daml.sequencer-client.submissions.sequencing¶
Summary: Rate and timings of sequencing requests
Description: This timer is started when a submission is made to the sequencer and then completed when a corresponding event is witnessed from the sequencer, so will encompass the entire duration for the sequencer to sequence the request. If the request does not result in an event no timing will be recorded.
Type: timer
Qualification: Latency
daml.sequencer-client.traffic-control.event-delivered¶
Summary: Number of events that were sequenced and delivered.
Description: Counter for event-delivered-cost.
Type: counter
Qualification: Traffic
daml.sequencer-client.traffic-control.event-delivered-cost¶
Summary: Cost of events that were sequenced and delivered.
Description: Cost of events for which the sender received confirmation that they were delivered. There is an exception for aggregated submissions: the cost of aggregate events will be recorded as soon as the event is ordered and the sequencer waits to receive threshold-many events. The final event may or may not be delivered successfully depending on the result of the aggregation.
Type: meter
Qualification: Traffic
daml.sequencer-client.traffic-control.event-rejected¶
Summary: Number of events that were sequenced but not delivered.
Description: Counter for event-rejected-cost.
Type: counter
Qualification: Traffic
daml.sequencer-client.traffic-control.event-rejected-cost¶
Summary: Cost of events that were sequenced but no delivered successfully.
Description: Cost of events for which the sender received confirmation that the events will not be delivered. The reason for non-delivery is labeled on the metric, if available.
Type: meter
Qualification: Traffic
daml.sequencer-client.traffic-control.submitted-event-cost¶
Summary: Cost of event submitted from the sequencer client.
Description: When the sequencer client sends an event to the sequencer to be sequenced, it will record on this metric the cost of the event. Note that the event may or may not end up being sequenced. So this metric may not exactly match the actual consumed traffic.
Type: meter
Qualification: Traffic
Health Metrics¶
The following metrics are exposed for all components.
daml_health_status¶
Description: The status of the component
Values:
0: Not healthy
1: Healthy
Labels:
component: the name of the component being monitored
Type: Gauge
gRPC Metrics¶
The following metrics are exposed for all gRPC endpoints. These metrics have the following common labels attached:
- grpc_service_name:
fully qualified name of the gRPC service (e.g.
com.daml.ledger.api.v1.ActiveContractsService
)
- grpc_method_name:
name of the gRPC method (e.g.
GetActiveContracts
)
- grpc_client_type:
type of client connection (
unary
orstreaming
)
- grpc_server_type:
type of server connection (
unary
orstreaming
)
- service:
Canton service’s name (e.g.
participant
,sequencer
, etc.)
daml_grpc_server_duration_seconds¶
Description: Distribution of the durations of serving gRPC requests
Type: Histogram
daml_grpc_server_messages_sent_total¶
Description: Total number of gRPC messages sent (on either type of connection)
Type: Counter
daml_grpc_server_messages_received_total¶
Description: Total number of gRPC messages received (on either type of connection)
Type: Counter
daml_grpc_server_started_total¶
Description: Total number of started gRPC requests (on either type of connection)
Type: Counter
daml_grpc_server_handled_total¶
Description: Total number of handled gRPC requests
Labels:
grpc_code: returned gRPC status code for the call (
OK
,CANCELLED
,INVALID_ARGUMENT
, etc.)
Type: Counter
daml_grpc_server_messages_sent_bytes¶
Description: Distribution of payload sizes in gRPC messages sent (both unary and streaming)
Type: Histogram
daml_grpc_server_messages_received_bytes¶
Description: Distribution of payload sizes in gRPC messages received (both unary and streaming)
Type: Histogram
HTTP Metrics¶
The following metrics are exposed for all HTTP endpoints. These metrics have the following common labels attached:
- http_verb:
HTTP verb used for a given call (e.g.
GET
orPUT
)
- host:
fully qualified hostname of the HTTP endpoint (e.g.
example.com
)
- path:
path of the HTTP endpoint (e.g.
/v2/parties
)
- service:
Daml service’s name (
json_api
for the JSON Ledger API Service)
daml_http_requests_duration_seconds¶
Description: Distribution of the durations of serving HTTP requests
Type: Histogram
daml_http_requests_total¶
Description: Total number of HTTP requests completed
Labels:
http_status: returned HTTP status code for the call
Type: Counter
daml_http_websocket_messages_received_total¶
Description: Total number of WebSocket messages received
Type: Counter
daml_http_websocket_messages_sent_total¶
Description: Total number of WebSocket messages sent
Type: Counter
daml_http_requests_payload_bytes¶
Description: Distribution of payload sizes in HTTP requests received
Type: Histogram
daml_http_responses_payload_bytes¶
Description: Distribution of payload sizes in HTTP responses sent
Type: Histogram
daml_http_websocket_messages_received_bytes¶
Description: Distribution of payload sizes in WebSocket messages received
Type: Histogram
daml_http_websocket_messages_sent_bytes¶
Description: Distribution of payload sizes in WebSocket messages sent
Type: Histogram
Pruning Metrics¶
The following metrics are exposed for all pruning processes. These metrics have the following labels:
- phase:
The name of the pruning phase being monitored
daml_services_pruning_prune_started_total¶
Description: Total number of started pruning processes
Type: Counter
daml_services_pruning_prune_completed_total¶
Description: Total number of completed pruning processes
Type: Counter
JVM Metrics¶
The following metrics are exposed for the JVM, if enabled.
runtime_jvm_gc_time¶
Description: Time spent in a given JVM garbage collector in milliseconds
Labels:
gc: Garbage collector regions (eg:
G1 Old Generation
,G1 New Generation
)
Type: Counter
runtime_jvm_gc_count¶
Description: The number of collections that have occurred for a given JVM garbage collector
Labels:
gc: Garbage collector regions (eg:
G1 Old Generation
,G1 New Generation
)
Type: Counter
runtime_jvm_memory_area¶
Description: JVM memory area statistics
Labels:
area: Can be
heap
ornon_heap
type: Can be
committed
,used
ormax
runtime_jvm_memory_pool¶
Description: JVM memory pool statistics
Labels:
pool: Defined pool name.
type: Can be
committed
,used
ormax
Logging¶
Canton uses Logback as the logging library. All Canton logs derive from the logger com.digitalasset.canton
. By default, Canton will write a log to the file log/canton.log
using the INFO
log-level and will also log WARN
and ERROR
to stdout.
How Canton produces log files can be configured extensively on the command line using the following options:
-v
(or--verbose
) is a short option to set the Canton log level toDEBUG
. This is likely the most common log option you will use.--debug
sets all log levels except stdout toDEBUG
. Stdout is set toINFO
. Note thatDEBUG
logs of external libraries can be very noisy.--log-level-root=<level>
configures the log-level of the root logger. This changes the log level of Canton and of external libraries, but not of stdout.--log-level-canton=<level>
configures the log-level of only the Canton logger.--log-level-stdout=<level>
configures the log-level of stdout. This will usually be the text displayed in the Canton console.--log-file-name=log/canton.log
configures the location of the log file.--log-file-appender=flat|rolling|off
configures if and how logging to a file should be done. The rolling appender will roll the files according to the defined date-time pattern.--log-file-rolling-history=12
configures the number of historical files to keep when using the rolling appender.--log-file-rolling-pattern=YYYY-mm-dd
configures the rolling file suffix (and therefore the frequency) of how files should be rolled.--log-truncate
configures whether the log file should be truncated on startup.--log-profile=container
provides a default set of logging settings for a particular setup. Only thecontainer
profile is supported, which logs to both STDOUT and to 10-hour limited rolling log files history (to avoid storage leaks).--log-immediate-flush=false
turns off immediate flushing of the log output to the log file.
Note that if you use --log-profile
, the order of the command line arguments matters. The profile settings can be overridden on the command line by placing adjustments after the profile has been selected.
Canton supports the normal log4j logging levels: TRACE
, DEBUG
, INFO
, WARN
, and ERROR
.
For further customization, a custom logback configuration can be provided using JAVA_OPTS
.
JAVA_OPTS="-Dlogback.configurationFile=./path-to-file.xml" ./bin/canton --config ...
If you use a custom log-file, the command line arguments for logging will not have any effect, except that --log-level-canton
and --log-level-root
can still be used to adjust the log level of the root loggers.
Viewing Logs¶
A log file viewer such as lnav is recommended to view Canton logs and resolve issues. Among other features, lnav has automatic syntax highlighting, convenient filtering for specific log messages, and the ability to view log files of different Canton components in a single view. This makes viewing logs and resolving issues more efficient than using standard UNIX tools such as less or grep.
The following features are especially useful when using lnav
:
Viewing log files of different Canton components in a single view, merged according to timestamps (
lnav <log1> <log2> ...
).Filtering specific log messages in (
:filter-in <regex>
) or out (:filter-out <regex>
). When filtering messages (for example, with a given trace-id), a transaction can be traced across different components, especially when using the single-view-feature described earlier.Searching for specific log messages (
/<regex>
) and jumping between them (n
andN
).Automatic syntax highlighting of parts of log messages (such as timestamps) and log messages themselves (for example,
WARN
log messages are yellow).Jumping between error (
e
andE
) and warn messages (w
andW
).Selectively activating and deactivating different filters and files (
TAB
and `` `` to activate/deactivate a filter).Marking lines (
m
) and jumping back and forth between marked lines (u
andU
).Jumping back and forth between lines that have the same trace-id (
o
andO
).
The custom lnav log format file for Canton logs canton.lnav.json
is bundled in any Canton release. You can install it with lnav -i canton.lnav.json
. JSON-based log files (which need to use the file suffix .clog
) can be viewed using the canton-json.lnav.json
format file.
Detailed Logging¶
By default, logging omits details to avoid writing sensitive data into log files. For debugging or educational purposes, you can turn on additional logging using the following configuration switches:
canton.monitoring.logging {
event-details = true
api {
message-payloads = true
max-method-length = 1000
max-message-lines = 10000
max-string-length = 10000
max-metadata-size = 10000
}
}
This turns on payload logging in the ApiRequestLogger
, which records every GRPC API invocation, and turns on detailed logging of the SequencerClient
and the transaction trees. Please note that all additional events are logged at DEBUG
level.
Note
Note that the detailed event logging will happen within an gRPC API Interceptor. This creates a sequential bottleneck as every message that is sent or received gets translated into a pretty-printed string. You will not be able to achieve the same performance if this setting is turned on.
Tracing¶
For further debugging, Canton provides a trace-id which allows you to trace the processing
of requests through the system. The trace-id is exposed to logback through the
mapping diagnostic context and can be included in the logback output pattern using %mdc{trace-id}
.
The trace-id propagation is enabled by setting the canton.monitoring.tracing.propagation = enabled
configuration option, which is enabled by default.
You can configure the service where traces and spans are reported for observing distributed traces. Refer to Traces for a preview.
Jaeger and Zipkin are supported. For example, Jaeger reporting can be configured as follows:
monitoring.tracing.tracer.exporter {
type = jaeger
address = ... // default: "localhost"
port = ... // default: 14250
}
This configuration connects to a running Jaeger server to report tracing information.
You can run Jaeger in a Docker container as follows:
docker run --rm -it --name jaeger\
-p 16686:16686 \
-p 14250:14250 \
jaegertracing/all-in-one:1.22.0
If you prefer not to use Docker, you can download the binary for your specific OS at Download Jaeger. Unzip the file and then run the binary jaeger-all-in-one (no arguments are needed). By default, Jaeger will expose port 16686 (for its UI, which can be seen in a browser window) and port 14250 (to which Canton will report trace information). Be sure to properly expose these ports.
Make sure that all Canton nodes in the network report to the same Jaeger server to have an accurate view of the full traces. Also, ensure that the Jaeger server is reachable by all Canton nodes.
Apart from jaeger, Canton nodes can also be configured to report in Zipkin or OTLP formats.
Sampling¶
You can change how often spans are sampled and reported to the configured exporter. By default, it will always report (monitoring.tracing.tracer.sampler.type = always-on
). You can configure it to never report (monitoring.tracing.tracer.sampler.type = always-off
), although this is less useful. Also, you can configure only a specific fraction of spans to be reported as follows:
monitoring.tracing.tracer.sampler = {
type = trace-id-ratio
ratio = 0.5
}
You can also change the parent-based sampling property. By default, it is turned on (monitoring.tracing.tracer.sampler.parent-based = true
). When turned on, a span is sampled iff its parent is sampled (the root span will follow the configured sampling strategy). There will never be incomplete traces; either the full trace is sampled or it is not. If you change this property, all spans will follow the configured sampling strategy and ignore whether the parent is sampled.
Known Limitations¶
Not every trace created which can be observed in logs is reported to the configured trace collector service. Traces originated at console commands or that are part of the transaction protocol are largely reported, while other types of traces are added to the set of reported traces as the need arises.
Also, the transaction protocol trace has a known limitation: once a command is submitted and its trace is fully reported, a new trace is created for any resulting Daml events that are processed. This occurs because the Ledger API does not propagate any trace context information from the command submission to the transaction subscription. As an example, when a participant creates a Ping
contract, you can see the full transaction processing trace of the Ping
command being submitted. However, a participant that processes the Ping
by exercising Respond
and creating a Pong
contract creates a separate trace instead of using the same one.
This differs from a situation where a single Daml transaction results in multiple actions at the same time, such as archiving and creating multiple contracts. In that case, a single trace encompasses the entire process, since it occurs as part of a single transaction rather than the result of an external process reacting to Daml events.
Traces¶
Traces contain operations that are each represented by a span. A trace is a directed acyclic graph (DAG) of spans, where the edges between spans are defined as parent/child relationships (the definitions come from the Opentelemetry glossary).
Canton reports several types of traces. One example: every Canton console command that interacts with the Admin API starts a trace whose initial span last for the entire duration of the command, including the GRPC call to the specific Admin API endpoint.

Graph of a Canton ping trace containing 18 spans¶
Traces of Daml command submissions are important. The trace illustrated in the figure results when you perform a Canton ping using the console. The ping is a smoke test that sends a Daml transaction (create Ping, exercise choice Pong, exercise choice Archive) to test a connection. It uses a particular smart contract that is preinstalled on every Canton participant. The command uses the Admin API to access a preinstalled application, which then issues Ledger API commands operating on this smart contract. In this example, the trace contains 18 spans. The ping is started by participant1
, and participant2
is the target. The trace focuses on the message exchange through the sequencer without digging deep into the message handlers or further processing of transactions.
In some cases, spans may start later than the end of their parents, due to asynchronous processing. This typically occurs when a new operation is placed on a queue to be handled later, which immediately frees the parent span and ends it.
The initial span (span 1) covers the duration of the ping operation. In span 2, the GrpcPingService in the participant node handles a GRPC request made by the console. It also lasts for the duration of the ping operation.
The Canton ping consists of three Daml commands:
The admin party for
participant1
creates aPing
contract.The admin party for
participant2
exercises theRespond
consuming choice on the contract, which results in the creation of aPong
contract.The admin party for
participant1
exercises theAck
consuming choice on it.
The submission of the first of the three Daml commands (the creation of the Ping contract) starts at span 3 in the example trace. Due to a limitation explained in the next section, the other two Daml command submissions are not linked to this trace. It is possible to find them separately. In any case, span 2 will only complete once the three Daml commands are completed.
At span 3, the participant node is on the client side of the Ledger API. In other use cases, it could be an application integrated with the participant. This span lasts for the duration of the GRPC call, which is received on the server side in span 4 and handled by the CantonSyncService
in span 5. The request is then received and acknowledged, but not fully processed. It is processed asynchronously later, which means that spans 3 through 5 will complete before the request is handled.
Missing steps from the trace (which account for part of the gap between spans 5 and 6) are:
The synchronizer routing where the participant decides which synchronizer to use for the command submission.
The preparation of the initial set of messages to be sent.
The start of the Canton transaction protocol begins at span 6. In this span, participant1
sends a request to sequencer1
to sequence the initial set of confirmation request messages as part of phase 1 of the transaction protocol. The transaction protocol has seven phases.
At span 7, sequencer1
receives the request and registers it. Receipt of the messages is not part of this span. That happens asynchronously at a later point.
At span 18, as part of phase 2, mediator1
receives an informee message. It only needs to validate and register it. Since it doesn’t need to respond, span 18 has no children.
As part of phase 3, participant2
receives a message (see span 8), and participant1
also receives a message (see span 9). Both participants asynchronously validate the messages. participant2
does not need to respond. Since it is only an observer, span 8 has no children. participant1
responds, however, which is visible at span 10. There, it again makes a call to sequencer1
, which receives it at span 11.
At span 12, participant1
receives a successful send response message that signals that its message to the mediator was successfully sequenced. This occurs as part of phase 4, where confirmation responses are sent to the mediator. The mediator receives it at span 13, and it validates the message (phase 5).
In spans 14 and 15, mediator1
(now at phase 6) asks sequencer1
to send the transaction result messages to the participants.
To end this round of the transaction protocol, participant1
and participant2
receive their messages at spans 16 and 17, respectively. The messages are asynchronously validated, and their projections of the virtual shared ledger are updated (phase 7).
As mentioned, there are two other transaction submissions that are unlinked from this ping trace but are part of the operation.
The second one starts at a span titled admin-ping.processTransaction
, which is created by participant2
. The third one has the same name but is initiated by participant1
.
Node Health Status¶
Each Canton node exposes rich health status information. Running:
<node>.health.status
returns a status object, which can be one of:
Failure
: if the status of the node cannot be determined, including an error message of why it failedNotInitialized
: if the node is not yet initializedSuccess[NodeStatus]
: if the status could be determined, including the detailed status
The NodeStatus
differs depending on the node type. A participant node responds with a message containing:
Participant id
: the participant id of the nodeUptime
: the uptime of this nodePorts
: the ports on which the participant node exposes the Ledger and the Admin API.Connected synchronizers
: the list of synchronizers to which the participant is properly connectedUnhealthy synchronizers
: the list of synchronizers to which the participant is trying to connect, but the connection is not ready for command submissionActive
: true if this instance is the active replica (It can be false in the case of the passive instance of a high-availability deployment.)
A synchronizer node or a sequencer node responds with a message containing:
Synchronizer id
: the unique identifier of the synchronizerUptime
: the uptime of this nodePorts
: the ports on which the synchronizer exposes the Public and the Admin APIConnected Participants
: the list of connected participantsSequencer
: a boolean flag indicating whether the embedded sequencer writer is operational
A sequencer node also returns the following additional field starting from Canton 2.8.6:
Accepts admin changes
: a boolean flag indicating whether the sequencer accepts admin changes
A synchronizer topology manager or a mediator node returns:
Node uid
: the unique identifier of the nodeUptime
: the uptime of this nodePorts
: the ports on which the node hosts its APIsActive
: true if this instance is the active replica (It can be false in the case of the passive instance of a high-availability deployment.)
Additionally, all nodes also return a components
field detailing the health state of each of its internal runtime dependencies. The actual components differ per node and can give further insights into the node’s current status. Example components include storage access, synchronizer connectivity, and sequencer backend connectivity.
Health Checks¶
gRPC Health Check Service¶
Each Canton node can optionally be configured to start a gRPC server exposing the gRPC Health Service. Passive nodes (see High Availability for more information on active/passive states) return NOT_SERVING
. Consider this when configuring liveness and readiness probes in a Kubernetes environment.
The precise way the state is computed is subject to change.
Here is an example monitoring configuration to place inside a node configuration object:
monitoring.grpc-health-server {
address = "127.0.0.1"
port = 5861
}
Note
The gRPC health server is configured per Canton node, not per process, as is the case for the HTTP health check server (see below). This means that the configuration must be inserted within a node’s configuration object.
Note
To support usage as a Kubernetes liveness probe, the health server exposes a service named liveness
that should be targeted when configuring a gRPC probe.
The latter service always returns SERVING
.
HTTP Health Check¶
Optionally, the canton
process can expose an HTTP endpoint indicating whether the process believes it is healthy. This may be used as an uptime check or as a Kubernetes liveness probe. If enabled, the /health
endpoint will respond to a GET
HTTP request with a 200 HTTP status code (if healthy) or 500 (if unhealthy, along with a plain text description of why it is unhealthy).
To enable this health endpoint, add a monitoring
section to the Canton configuration. Since this health check is for the whole process, add it directly to the canton
configuration rather than for a specific node.
canton {
monitoring.health {
server {
port = 7000
}
check {
type = ping
participant = participant1
interval = 30s
}
}
This health check causes participant1
to “ledger ping” itself every 30 seconds. The process is considered healthy if the ping is successful.
Health Dumps¶
You should provide as much information as possible to receive efficient support. For this purpose, Canton implements an information-gathering facility that gathers key essential system information for support staff. If you encounter an error where you need assistance, please ensure the following:
Start Canton in interactive mode, with the
-v
option to enable debug logging:./bin/canton -v -c <myconfig>
. This provides a console prompt.Reproduce the error by following the steps that previously caused the error. Write down these steps so they can be provided to support staff.
After you observe the error, type
health.dump()
into the Canton console to generate a ZIP file.
This creates a dump file (.zip
) that stores the following information:
The configuration you are using, with all sensitive data stripped from it (no passwords).
An extract of the log file. Sensitive data is not logged into log files.
A current snapshot on Canton metrics.
A stacktrace for each running thread.
Provide the gathered information to your support contact together with the exact list of steps that led to the issue. Providing complete information is very important to help troubleshoot issues.
Remote Health Dumps¶
When running a console configured to access remote nodes, the health.dump()
command gathers health data from the remote nodes and packages them into resulting zip files. There is no special action required. You can obtain the health data of a specific node by targeting it when running the command. For example:
remoteParticipant1.health.dump()
When packaging large amounts of data, increase the default timeout of the dump command:
health.dump(timeout = 2.minutes)