Flow
Diagnose a 5xx burst.
Two sources, narrow window, fast turnaround. Web tier is returning 502/503/504 to users. Goal: identify the upstream service that triggered the cascade and produce a compact RCA the team can act on inside an hour.
Open a case and bring in two sources
⌘N opens a new case. Drop in two files:
edge-access.log— your nginx/Apache combined log for the 11:30–12:00 windowupstream-syslog.log— RFC 5424 syslog from whichever services sit behind the proxy (auth, orders, payments)
Both files are SHA-256 hashed at ingest. The hashes flow into the export bundle so the writeup carries chain-of-custody by default.
Crop to the burst
Switch the sidebar to All sources. The density chart shows two stripes — the access-log stripe spikes hard at 11:47, the upstream stripe spikes the same second.
Click-and-drag in the density chart to crop to the burst (~11:46–11:54). The event table now shows a few hundred events in chronological order, instead of tens of thousands.
Read the findings
For a 5xx burst against a healthy backend, two rules typically fire together:
- high web.upstream.5xx_burst (302 events)
- high web.upstream.timeout (87 events)
Click into web.upstream.timeout. The citations resolve to nginx lines like upstream timed out (110: Connection timed out). Loupe co-locates them with upstream syslog lines from the same second. The upstream lines name the service.
Identify the upstream
Loupe's entity extractor pulls hostnames, IPs, and service names from the cited events. Open the Affected Assets panel — every host that appeared in the cropped window is listed with a count of how many cited events name it.
The upstream that timed out shows the highest count by an order of magnitude. Mark it Affected, mark the proxy Downstream, and mark unrelated infra rows Unrelated so they don't pollute the writeup.
Pick the compact template
Open the RCA editor. For a fast-turnaround web incident, two templates fit:
- PagerDuty Incident Response — six sections, takes ten minutes to fill in
- Google SRE Postmortem — eleven sections, full retrospective; pick this when the incident has cross-team learnings
For the standup-deadline case, PagerDuty is the right call. The Action Items section pre-fills from the rule fires — each item already cites the events it's based on.
Export
File → Export Case Bundle. You get a directory plus an optional encrypted zip:
- RCA.pdf — the writeup the team will skim
- RCA.md — Markdown for the runbook
- IODEF.xml — for an ITSM pipeline if you have one
- Supporting Evidence/timeline.csv — every cited event
- Hashes.txt — recipient verification
A typical 5xx-burst case ships in 15–25 minutes from drop-in to export — the upstream identification is what takes the time; everything mechanical is automatic.
Done.
Compared to the four-source database outage flow, a 5xx burst is shorter on every axis: fewer files, fewer rules to chase, smaller timeline window. The shape of the work is the same — Loupe brings the mechanical rigor so you can spend the time on identification.