Quarkus observability as a practical reference

At some point, every project claims “we have monitoring” and what it really means is:

one dashboard nobody trusts,
logs without context,
traces that exist only in slides.

So I built a proper reference implementation in quarkus-observability.

The goal was simple: have one repo I can copy from for future projects, and finally understand the full path from request to trace to logs to metrics without hand-wavy gaps.

What happened over that session

The commit timeline is basically the diary:

2026-02-14 13:43 | Initial commit
2026-02-14 14:00 | initial CoffeeShop setup
2026-02-14 15:40-16:20 | multiple amend/rebase/reset iterations
2026-02-14 17:29 | dashboard and UX polish pass

Based on the reflog, the visible implementation sprint happened on Saturday, February 14, 2026, from early afternoon into early evening, with several rewrite passes in between. I still would not pin this to an exact duration, because the thinking and reading part was the bigger chunk anyway. It ended somewhere between “just one more panel” and “why are there 12 tabs open in Grafana?“.

What I actually built

The stack is local-first with Docker Compose and includes:

Quarkus app (app)
OTel Collector with tail sampling
Tempo for traces
Prometheus for metrics
Loki + Promtail for logs
Grafana as central UI
Redpanda/Kafka + MariaDB for the business flow (CoffeeShop order lifecycle)

Grafana technical overview

The hardest part was not writing the YAML or Java code. It was reading the documentation of each component carefully enough to wire everything together correctly: Quarkus, OTel Collector, Tempo, Prometheus, Loki, Promtail and Grafana all have slightly different expectations.

The part that made it click for me

The real win was making correlation explicit instead of accidental:

Requests create spans in Quarkus.
Spans go via OTLP HTTP to the collector.
Collector keeps important traces (errors, slow requests, business endpoints) and down-samples the rest.
Logs include trace_id and span_id.
In Grafana, I can jump from trace to logs and back without guessing.

That sampling setup looked roughly like this:

- keep-errors
- keep-slow-traces (>= 1000ms)
- keep-business-order-lifecycle (/coffeeshop/orders...)
- sample-technical-q-endpoints (5%)
- sample-fast-normal-traces (10%)

And this log format turned out to be non-negotiable:

quarkus.log.console.format=... trace_id=%X{traceId} span_id=%X{spanId} ...

Without those fields, “observability” quickly becomes archaeology.

Why this became my reference repo

I wanted something practical I can copy into future services, not another “hello telemetry” demo. So this repo now includes:

outbox-based event publishing (DB + Kafka flow)
runbook-style docs and config reference
working dashboards for technical and management views
a small load script (./scripts/load-coffeeshop.sh) to generate realistic traces quickly

The load script was especially useful because it repeatedly creates and picks up orders, which made it easy to validate end-to-end behavior under repeated flows.

One screenshot that sums it up

This is the point where it felt “real”: opening a trace in Grafana and correlating it with logs.

Grafana traces explore

If it’s not observable end-to-end, it doesn’t exist yet.

What I would copy first into other projects

If I had to pick only three things:

trace/log correlation (trace_id, span_id in logs)
head + tail sampling strategy (not only one side)
a runnable local stack that new teammates can start with one command

Everything else is optimization.

And yes, this started as “just a quick setup”.