Observability

Telemetry Posture

How operational visibility is approached in the Vorantiq operating environment. This page distinguishes what is wired and live, what is scaffolded and no-op safe, what is intentionally not enabled, and what is planned. No implied maturity beyond verified reality.

Pre-launch posture

Runtime telemetry primitives are shipped and no-op safe. No external observability vendor is wired into the production deployment today; the integration code is vendor-neutral and activates only when the deployment-time environment variables are set. Until that activation lands, errors are captured via structured logging and the per-request correlation id is the join key for support investigations.

Status legend

Complete

Wired and live in production runtime.

Scaffolded

Code is shipped and no-op safe. Not yet active against an external surface.

Not enabled

Code is shipped, but the deployment-time environment variable required to activate the capability is intentionally unset in production today.

Architected

Designed in a tracked B-document; implementation pending.

Planned

Acknowledged future work without a current implementation.

Blocked

Cannot advance until the active Production-Safety Stop is lifted.

Capabilities

Per-request correlation

Complete

X-Request-ID propagation, contextvar binding, response echo. W3C traceparent compatible.

Distributed tracing

Scaffolded

Vendor-neutral OpenTelemetry adapter. trace_operation decorator + context manager used across orchestration and runtime.

Structured logging

Scaffolded

JSON formatter, context filter (request id + tenant id), domain-specific helpers for api / agent / workflow / provider / security events.

Metrics registry

Scaffolded

Counter / Gauge / Histogram primitives and a Prometheus-compatible endpoint helper.

Health probe

Complete

/health liveness with dependency checks (database, redis, resend, auth router). Verified live in production.

Audit-chain telemetry

Blocked

Per-tenant SHA-256 hash chain over canonical audit events. Code complete; live-DB activation is blocked by the Production-Safety Stop.

Error aggregation

Not enabled

Frontend Sentry SDK is shipped with three runtime configs (client / server / edge); error boundaries call captureException. NEXT_PUBLIC_SENTRY_DSN is unset in production, so the SDK initializes as no-op. Backend SDK integration is design-track B.5.3.

PII scrubber

Architected

Scrubber for span attributes and structured-log payloads is architected and lands before any live tracing backend is enabled.

Honest gaps

These are the deficits an enterprise SRE reviewer should expect to see closed before regulated-industry adoption. Each is sourced to docs/observability/README.md and reflected in /security gaps.

No live distributed-tracing backend. OpenTelemetry adapter is real and vendor-neutral; no exporter endpoint configured today.

No production telemetry vendor enabled. Honeycomb / Datadog / Tempo selection is design-track B.5.1.

No latency heatmaps or p50/p95/p99 dashboards. Depends on the live tracing backend.

No runtime topology maps. Depends on the live tracing backend.

No live SIEM integration. Audit-event SIEM forward is unscheduled.

Production metrics endpoint is not registered. Code is ready; exposure is design-track B.5.2.

Frontend Sentry SDK shipped but not active — DSN unset in production. Backend Sentry SDK not yet integrated.

PII scrubber for spans/logs is architected, not implemented.

Audit-events live-DB activation is blocked by the Production-Safety Stop.

Vendor-neutrality posture

The observability code never imports a vendor-specific exporter package. Activation is purely environment-driven via standard OTel variables (OTEL_ENABLED, OTEL_SERVICE_NAME, OTEL_EXPORTER_OTLP_ENDPOINT, OTEL_EXPORTER_OTLP_HEADERS). A customer-private deployment can route traces to its own collector without code change; the adapter cannot phone home to a vendor we did not document.

Privacy constraints

Correlation ids are not secrets, not PII, and safe to log. Structured-log call sites must not log secrets, OAuth tokens, password material, or full request bodies on sensitive surfaces; the PII scrubber is the second line of defense. Trace span attributes must not carry tenant-isolating PII; tenant ids are the resolution boundary. The canonical audit-events schema does not store unbounded free-text payloads — fields are typed to the action category.

View canonical Observability Doctrine