- Overview
- Tutorials
- Getting started
- Get started with Canton and the JSON Ledger API
- Get Started with Canton, the JSON Ledger API, and TypeScript
- Get started with Canton Network App Dev Quickstart
- Get started with smart contract development
- Basic contracts
- Test templates using Daml scripts
- Build the Daml Archive (.dar) file
- Data types
- Transform contracts using choices
- Add constraints to a contract
- Parties and authority
- Compose choices
- Handle exceptions
- Work with dependencies
- Functional programming 101
- The Daml standard library
- Test Daml contracts
- Next steps
- Application development
- Getting started
- Development how-tos
- Component how-tos
- Explanations
- References
- Application development
- Smart contract development
- Daml language cheat sheet
- Daml language reference
- Daml standard library
- DA.Action.State.Class
- DA.Action.State
- DA.Action
- DA.Assert
- DA.Bifunctor
- DA.Crypto.Text
- DA.Date
- DA.Either
- DA.Exception
- DA.Fail
- DA.Foldable
- DA.Functor
- DA.Internal.Interface.AnyView.Types
- DA.Internal.Interface.AnyView
- DA.List.BuiltinOrder
- DA.List.Total
- DA.List
- DA.Logic
- DA.Map
- DA.Math
- DA.Monoid
- DA.NonEmpty.Types
- DA.NonEmpty
- DA.Numeric
- DA.Optional
- DA.Record
- DA.Semigroup
- DA.Set
- DA.Stack
- DA.Text
- DA.TextMap
- DA.Time
- DA.Traversable
- DA.Tuple
- DA.Validation
- GHC.Show.Text
- GHC.Tuple.Check
- Prelude
- Smart contract upgrading reference
- Glossary of concepts
Troubleshoot¶
When things go wrong, it is important to be able to diagnose the problem quickly and effectively. This section aims to get an operator up to speed with the most common troubleshooting techniques and tools available in PQS. You might want to refer to it for ideas when devising your own troubleshooting procedures.
PQS application (pipeline
process) quick facts:
exports ledger events into queryable data store
does not send ledger commands
is stateless
is restart friendly (fast restarts in absence of migration, Daml model changes, etc)
is tolerant to unavailable dependencies (through retry loop)
uses only 1 Ledger API stream connection (flat transaction or transaction tree) after initialisation
uses a pool of connections to Postgres (16 by default)
can be secured with TLS on both connections
uses OpenTelemetry Agent for its observability signals exports
can export diagnostics archive (with metrics and thread dumps over time)
Look into exit codes of PQS process or Docker/Kubernetes container orchestrator:
137
indicates the process was killed by external forces withSIGKILL
(-9
), see alsonon-zero exit might indicate invalid starting conditions which are treated as non-recoverable errors. Causes might include:
misspelled startup parameter names or values
Look into logs for activity indicators:
ledger keep-alives are present
watermark advances in the presence of expected ledger traffic, see also here and here
retry loop indicates recoverable errors (both upstream and downstream), examine message for indication of underlying cause
in case of non-recoverable errors, keep in mind that the last visible stacktrace does not necessarily represent the true root cause - explore events that preceded it by requesting a bigger slice of logs before the termination
Look into metrics for detailed breakdown of PQS internals:
correlate transactions (
pipeline_events_total{type="transaction"}
) and watermark (watermark_ix
) throughput metrics to identify if any slowdowns are present in the PQS pipelineget an idea of PQS pipeline introduced latency - see here
get an idea of contract churn (which correlates with write activity of PQS) by template - see here
Look into database to get familiar with Daml model footprint:
Look into database statistics for resource utilisation
get an idea of I/O split - disk vs index, cache sizing (tables and indexes)
inspect if
bgwriter
flush triggers too frequently
Try correlating representative metrics between PQS & Canton (if available).
To escalate issues to Digital Asset’s support team, please provide forensics by collecting diagnostics dump in proximity of the incident time and attach the resulting archive to the support ticket.