M5.3 — Assertions, markers, capture¶

What you'll learn

The assertion vocabulary Gauntlet teams build from: log markers, screenshots, and performance/diagnostic captures.
What makes a marker good (stable, unique, configuration-visible).
How to choose a capture that actually catches the bug you care about.

How it applies (QA)

Gauntlet has no built-in assert (M1.1, boundary #3) — your assertions are whatever you can observe from outside. The quality of a Gauntlet suite is mostly the quality of its markers and captures. A flaky or shallow marker is a flaky or shallow test, no matter how clean the C# around it is.

Concepts¶

Three ways to assert¶

Log markers — a unique string the game/controller prints; the harness matches it (M2.4, M4.3). The default, cheapest assertion. "Passed" usually means "saw marker X, no crash."
Screenshots — capture the framebuffer at a defined moment and compare (against a reference, or just preserve for human/automated review). Catches visual failures a log can't: black screen, missing UI, corrupted render — all of which can happen while the log says "fine."
Performance / diagnostic captures — frame timing, hitches, memory reports, stat dumps, profiling traces written to /Saved. Turn "it ran" into "it ran within budget." Essential for perf-regression and soak tests.

A mature suite layers these: a marker says it reached the state, a screenshot says it looked right there, a perf capture says it got there fast enough.

What makes a good marker¶

A marker is a contract between the game and the test. Good ones are:

Unique — a string that appears only when the intended event happens. Done matches a hundred unrelated lines; LogShooter: Smoke_ReachedPlayable matches one.
Stable — not tied to wording that changes every refactor. Define markers deliberately (a named constant the game emits), don't scrape incidental log prose.
Configuration-visible — present in the configuration you assert against. A marker logged at Display/Verbose verbosity is compiled out of Shipping (M2.2); if you assert in Shipping, emit it at a surviving verbosity or via a channel that ships.
Late enough — emitted only once the thing it certifies has actually happened (M4.3's false-pass trap). "Reached playable" must fire after the level streamed and the pawn spawned, not at level-load request.

Choosing a capture for the bug¶

Match the capture to the failure mode:

You want to catch…	Use	Because
Reached a state / a step completed	log marker	cheap, exact, ends up in artifacts
Black screen / missing HUD / render corruption	screenshot	logs say "loaded"; the pixels say otherwise
Frame drops / hitching / regressing FPS	perf capture	a marker can't see 12 fps
Memory growth over a soak	memory report over time	leaks are invisible to a single marker
A crash/fatal/ensure	(automatic) Gauntlet detection	the base catches these without a marker

The recurring lesson from M4.3/M5.2: a green marker certifies only that one observable. If "looks right" or "fast enough" matters, you need a screenshot or perf capture — the marker won't catch it.

Pitfall: the marker that lies in Shipping

The single most common assertion bug: a perfectly good marker emitted at a verbosity that Shipping strips. The Development nightly is green for months; the first Shipping/cert run is red (timeout, marker never seen) — and the build was fine. Decide the marker's verbosity for the configuration you ship, and prove it survives there before trusting the suite.

Worked example — hardening a "reached playable" assertion¶

Start: pass on log line Loading level.... Problems, fixed in passes:

Not unique / too early — Loading level... fires at the request, not arrival, and matches other loads. Replace with a deliberate, unique marker emitted after spawn: LogShooter: Smoke_ReachedPlayable.
Looked-right unknown — marker can't see a black level. Add a screenshot at the moment the marker fires; preserve it in /Saved for review.
Fast-enough unknown — add a perf capture over the first 10s of play; fail if average frame time exceeds budget.
Shipping-visible — confirm Smoke_ReachedPlayable is emitted at a verbosity that survives the configuration the suite runs in.

Now "passed" means reached playable, looked right, ran within budget, in the configuration that ships — a claim worth trusting.

Lab — Specify the assertions for a level-load perf test

Goal: "Loading TestArena on PS5 completes within 8s and sustains ≥30 fps for the first 10s." Specify, on paper: (1) the marker(s) and where each is emitted, (2) the perf capture and its pass thresholds, (3) whether a screenshot adds value here and why/why not, (4) the verbosity/ configuration check for each marker, (5) the failure each assertion would and would not catch.

Exercise 1 — Grade the marker

Rate each marker good/bad and why: ok, LogShooter: Smoke_ReachedPlayable, Display: entering main menu now, a marker logged Verbose asserted in Shipping.

Exercise 2 — Pick the capture

Which capture (marker / screenshot / perf / memory-over-time) for each: (a) HUD renders after load, (b) no frame hitches during a cutscene, (c) reached the lobby, (d) a 30-minute soak doesn't leak.

Self-check — answers

Lab: (1) Smoke_LoadStarted at load request and Smoke_ReachedPlayable after spawn — load time = delta between them, must be ≤8s. (2) Perf capture over the first 10s post-playable; fail if average fps <30 (or frame time >33.3 ms), ideally also flag worst-1% hitches. (3) Screenshot does add value — confirms the arena rendered rather than loading to a black/garbage frame that still hits 30 fps. (4) Both markers must be emitted at a verbosity present in the PS5 configuration under test. (5) Catches: slow load, low fps, black-screen (via shot). Doesn't catch: gameplay correctness, audio, anything after the 10s window.

Exercise 1: ok — bad (not unique, not stable); Smoke_ReachedPlayable — good (unique, stable, named); Display: entering main menu now — bad (incidental prose, brittle, likely stripped in Shipping); Verbose marker asserted in Shipping — bad (compiled out → never seen → false timeout).

Exercise 2: a screenshot, b perf capture, c log marker, d memory-over-time.

Done when

[ ] You can name the three assertion types and what each uniquely catches.
[ ] You can state the four properties of a good marker (unique, stable, configuration-visible, late enough).
[ ] You can match a capture type to a failure mode.
[ ] You can explain and avoid the Shipping-stripped-marker false negative.

Next: M6.1 — Gauntlet in a pipeline.