Pipeline Health

The Pipeline Monitor page shows the live state of the processing pipeline. It refreshes automatically every five seconds and displays how many records are waiting at each stage, how fast records are flowing through, which batches are currently running, and whether any stage has backed up messages that need attention. Use it for at-a-glance health checks and as the first stop when a sync seems slower than expected.

Pipeline Monitor showing the five-stage flow with message counts

Accessing the page

In the left navigation, under Operations, select Pipeline Monitor.

A Live 5s indicator in the page header confirms the display is refreshing in real time. The header also shows the current total number of records in flight across all stages.

Reading the pipeline flow

The Pipeline Flow card shows one box per processing stage, arranged left to right in the order records move through them:

graph LR
    A[Ingest] --> B[Normalize]
    B --> C[Transform]
    C --> D[Execute]
    D --> E[Finalize]
    D -->|on failure| F[Error]

Each stage box displays:

Indicator	Meaning
Stage name	The human-readable label for this processing step.
Message count	Number of records currently waiting to be processed at this stage.
Status dot	Green when the count is low; turns amber or red as the count grows.
Consumer count	Number of active workers processing records at this stage.
Throughput	Publish rate and delivery rate in messages per second, shown as `in/out msg/s`.

Below each stage box, a smaller error count box appears if any records have failed at that stage and are awaiting attention. These records have not been automatically retried.

What a healthy pipeline looks like

All stage boxes show a low or zero message count.
Status dots are green.
Throughput numbers are non-zero when a sync is running.
Active Batches shows running batches with progress advancing over time.
The Recent Failed Runs table is empty or shows a low failure percentage.

Warning signs

Sign	Likely cause	What to do
One stage has a large and growing message count	Processing is backed up at that stage — workers may be stalled or a downstream system is slow	Check Sync Logs for error messages; escalate if messages do not drain within a few minutes
Error count box appears under a stage	Records failed at that stage and need manual action	Open DLQ and Replay to inspect and replay or discard them
All stage boxes show zero consumers	Workers are not running	Contact your system administrator
Active batch shows no progress for several minutes	The batch may be stalled	Use the Purge option (admin only) to cancel the stalled batch, then re-trigger the run
”Unable to connect” message in the flow card	The health monitoring service cannot reach the pipeline backend	Contact your system administrator

Active batches

The Active Batches card lists every batch sync currently in progress. For each batch you can see:

The integration instance name and entity type being synced.
How the batch was triggered (scheduled, manual, or dependency cascade).
Time elapsed since the batch started.
A progress bar showing records processed out of total, split into green (success), grey (skipped), and red (failed) segments.
The number of failures, if any.

When a batch is in the extraction phase and the total record count is not yet known, the progress bar pulses blue with an “Extracting…” label.

Administrators can cancel a single stalled batch using the Purge button on its card, or cancel all running batches at once using Purge All. Both actions are irreversible — pending records in the cancelled batch are discarded.

Failure alerting policy

The Failure Alerting Policy card shows the global thresholds that trigger an incident notification when a run produces too many failures:

Field	Description
Min Failures	Minimum number of failed records before an incident is raised.
Min Failure Rate	Minimum failure percentage before an incident is raised.
Category Dominance	Minimum percentage of failures that must belong to one error category for a digest to be sent.
Digest Interval	How often (in minutes) the system sends a failure digest email.
Max Digests/Run	Maximum number of digest emails sent per run.

These thresholds apply globally. Administrators can set per-instance overrides in the Per-Instance Alert Overrides card below it — useful when a specific instance tolerates a higher failure rate than the system default.

Recent failed runs

The Recent Failed Runs table shows integration runs that ended with a high failure percentage. For each run you can see the instance name, entity type, final status, failed record count out of total, the failure percentage, and the start time. Use this table for incident triage when you need a quick list of which jobs are failing without searching through Sync Logs.

When to escalate

Self-service actions available from this page:

Replay failed records from the DLQ (see DLQ and Replay).
Purge a stalled batch (admin role required).
Adjust per-instance alert thresholds (admin role required).

Escalate to your system administrator when:

Workers show zero consumers and do not recover after a few minutes.
The pipeline flow card shows a connection error.
A stage queue continues to grow without draining after replaying or purging.

Sync Logs — per-record synchronization history
DLQ and Replay — replay or discard failed records
Audit Log — configuration change history
Common Errors — error patterns and resolutions