M5.3 — Assertions, markers, capture¶
What you'll learn
- The assertion vocabulary Gauntlet teams build from: log markers, screenshots, and performance/diagnostic captures.
- What makes a marker good (stable, unique, configuration-visible).
- How to choose a capture that actually catches the bug you care about.
How it applies (QA)
Gauntlet has no built-in assert (M1.1, boundary #3) — your assertions are whatever you can
observe from outside. The quality of a Gauntlet suite is mostly the quality of its markers and
captures. A flaky or shallow marker is a flaky or shallow test, no matter how clean the C# around
it is.
Concepts¶
Three ways to assert¶
- Log markers — a unique string the game/controller prints; the harness matches it (M2.4, M4.3). The default, cheapest assertion. "Passed" usually means "saw marker X, no crash."
- Screenshots — capture the framebuffer at a defined moment and compare (against a reference, or just preserve for human/automated review). Catches visual failures a log can't: black screen, missing UI, corrupted render — all of which can happen while the log says "fine."
- Performance / diagnostic captures — frame timing, hitches, memory reports,
statdumps, profiling traces written to/Saved. Turn "it ran" into "it ran within budget." Essential for perf-regression and soak tests.
A mature suite layers these: a marker says it reached the state, a screenshot says it looked right there, a perf capture says it got there fast enough.
What makes a good marker¶
A marker is a contract between the game and the test. Good ones are:
- Unique — a string that appears only when the intended event happens.
Donematches a hundred unrelated lines;LogShooter: Smoke_ReachedPlayablematches one. - Stable — not tied to wording that changes every refactor. Define markers deliberately (a named constant the game emits), don't scrape incidental log prose.
- Configuration-visible — present in the configuration you assert against. A marker logged at
Display/Verboseverbosity is compiled out ofShipping(M2.2); if you assert in Shipping, emit it at a surviving verbosity or via a channel that ships. - Late enough — emitted only once the thing it certifies has actually happened (M4.3's false-pass trap). "Reached playable" must fire after the level streamed and the pawn spawned, not at level-load request.
Choosing a capture for the bug¶
Match the capture to the failure mode:
| You want to catch… | Use | Because |
|---|---|---|
| Reached a state / a step completed | log marker | cheap, exact, ends up in artifacts |
| Black screen / missing HUD / render corruption | screenshot | logs say "loaded"; the pixels say otherwise |
| Frame drops / hitching / regressing FPS | perf capture | a marker can't see 12 fps |
| Memory growth over a soak | memory report over time | leaks are invisible to a single marker |
| A crash/fatal/ensure | (automatic) Gauntlet detection | the base catches these without a marker |
The recurring lesson from M4.3/M5.2: a green marker certifies only that one observable. If "looks right" or "fast enough" matters, you need a screenshot or perf capture — the marker won't catch it.
Pitfall: the marker that lies in Shipping
The single most common assertion bug: a perfectly good marker emitted at a verbosity that
Shipping strips. The Development nightly is green for months; the first Shipping/cert run is red
(timeout, marker never seen) — and the build was fine. Decide the marker's verbosity for the
configuration you ship, and prove it survives there before trusting the suite.
Worked example — hardening a "reached playable" assertion¶
Start: pass on log line Loading level.... Problems, fixed in passes:
- Not unique / too early —
Loading level...fires at the request, not arrival, and matches other loads. Replace with a deliberate, unique marker emitted after spawn:LogShooter: Smoke_ReachedPlayable. - Looked-right unknown — marker can't see a black level. Add a screenshot at the moment the
marker fires; preserve it in
/Savedfor review. - Fast-enough unknown — add a perf capture over the first 10s of play; fail if average frame time exceeds budget.
- Shipping-visible — confirm
Smoke_ReachedPlayableis emitted at a verbosity that survives the configuration the suite runs in.
Now "passed" means reached playable, looked right, ran within budget, in the configuration that ships — a claim worth trusting.
Lab — Specify the assertions for a level-load perf test
Goal: "Loading TestArena on PS5 completes within 8s and sustains ≥30 fps for the first 10s." Specify, on paper: (1) the marker(s) and where each is emitted, (2) the perf capture and its pass thresholds, (3) whether a screenshot adds value here and why/why not, (4) the verbosity/ configuration check for each marker, (5) the failure each assertion would and would not catch.
Exercise 1 — Grade the marker
Rate each marker good/bad and why: ok, LogShooter: Smoke_ReachedPlayable,
Display: entering main menu now, a marker logged Verbose asserted in Shipping.
Exercise 2 — Pick the capture
Which capture (marker / screenshot / perf / memory-over-time) for each: (a) HUD renders after load, (b) no frame hitches during a cutscene, (c) reached the lobby, (d) a 30-minute soak doesn't leak.
Self-check — answers
Lab: (1) Smoke_LoadStarted at load request and Smoke_ReachedPlayable after spawn — load
time = delta between them, must be ≤8s. (2) Perf capture over the first 10s post-playable; fail
if average fps <30 (or frame time >33.3 ms), ideally also flag worst-1% hitches. (3) Screenshot
does add value — confirms the arena rendered rather than loading to a black/garbage frame that
still hits 30 fps. (4) Both markers must be emitted at a verbosity present in the PS5 configuration
under test. (5) Catches: slow load, low fps, black-screen (via shot). Doesn't catch: gameplay
correctness, audio, anything after the 10s window.
Exercise 1: ok — bad (not unique, not stable); Smoke_ReachedPlayable — good (unique,
stable, named); Display: entering main menu now — bad (incidental prose, brittle, likely
stripped in Shipping); Verbose marker asserted in Shipping — bad (compiled out → never seen →
false timeout).
Exercise 2: a screenshot, b perf capture, c log marker, d memory-over-time.
Done when
- [ ] You can name the three assertion types and what each uniquely catches.
- [ ] You can state the four properties of a good marker (unique, stable, configuration-visible, late enough).
- [ ] You can match a capture type to a failure mode.
- [ ] You can explain and avoid the Shipping-stripped-marker false negative.