Observability
Health endpoints
GET /healthz: process livenessGET /readyz: readiness across PostgreSQL, Redis, and bucketsGET /metrics: Prometheus metrics for the main server
Any readiness failure increments vylux_readiness_failures_total{check=...}.
Worker metrics
When running worker-only mode, Vylux can expose a separate listener on WORKER_METRICS_PORT for worker metrics and basic health checks.
For the exact defaults and validation rules of WORKER_METRICS_PORT and OTEL_EXPORTER_OTLP_ENDPOINT, see Configuration.
Prometheus metric families
The most useful metric families today are:
| Metric | Meaning |
|---|---|
vylux_http_requests_total | HTTP request count by method, route, and status |
vylux_http_request_duration_seconds | HTTP request latency |
vylux_image_cache_events_total | image cache hits and misses by layer |
vylux_image_results_total | top-level image request outcomes |
vylux_image_errors_total | image failures by stage and status |
vylux_worker_tasks_total | worker task attempts by task type and result |
vylux_worker_task_duration_seconds | worker task latency |
vylux_readiness_failures_total | readiness failures by dependency check |
vylux_queue_tasks | queue depth by queue and state |
vylux_queue_metrics_sync_failures_total | failures while refreshing queue-depth metrics |
Tracing
OpenTelemetry tracing is integrated across HTTP requests and queued media tasks. The system propagates trace context into async workflows so job execution is visible as part of the same trace tree.
Relevant headers
traceparenttracestateX-Trace-ID
X-Trace-ID is a convenience header for manual debugging and log correlation. The authoritative context still comes from the W3C trace headers.
Enabling export
Set:
OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4318
or another OTLP HTTP endpoint. If the variable is empty, spans are still created locally but are not exported.
Local Jaeger validation
If you want to inspect end-to-end traces from the HTTP request into worker execution, use a minimal collector plus Jaeger stack. The following inline example captures the important details that were previously kept in local helper files, so the published docs remain self-contained.
docker-compose example
services:
jaeger:
image: jaegertracing/all-in-one:1.76.0
restart: unless-stopped
environment:
COLLECTOR_OTLP_ENABLED: true
ports:
- 16686:16686
otel-collector:
image: otel/opentelemetry-collector-contrib:0.148.0
command: [--config=/etc/otelcol/otelcol.yaml]
restart: unless-stopped
depends_on:
- jaeger
ports:
- 4317:4317
- 4318:4318
- 13133:13133
Minimal collector trace pipeline:
receivers:
otlp:
protocols:
grpc:
endpoint: 0.0.0.0:4317
http:
endpoint: 0.0.0.0:4318
processors:
batch:
exporters:
debug:
verbosity: normal
otlp/jaeger:
endpoint: jaeger:4317
tls:
insecure: true
service:
extensions: [health_check]
pipelines:
traces:
receivers: [otlp]
processors: [batch]
exporters: [debug, otlp/jaeger]
Validation flow
- Start Jaeger and the collector.
- Set
OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4318for both server and worker. - Submit a
POST /api/jobs, ideallyvideo:transcodeorvideo:full. - Capture the
X-Trace-IDfrom the HTTP response headers or logs. - Open
http://localhost:16686and search for servicevyluxor paste the trace ID directly.
What to watch
- readiness failures
- queue depth and task latency
- image cache behavior
- media job success and failure trends
Troubleshooting hints
GET /healthzsucceeds but/readyzfails: usually PostgreSQL, Redis, or bucket reachability is broken- worker metrics are empty: confirm that Vylux is actually running in
--mode=workerand thatWORKER_METRICS_PORTis not0 - Jaeger shows no traces: verify that the exporter points to an OTLP HTTP endpoint, not the Jaeger UI port