M6.3 — Capstone lab¶

What you'll do

Design a boot + smoke Gauntlet test for a sample title, end to end — from the mental model through authoring to the pipeline gate and a triage runbook. This pulls every module together into one artifact you could hand to a team.

How it applies (QA)

Real Gauntlet ownership isn't writing one TickTest() — it's the whole chain: deciding what to test, where it runs, what "passed" means, what it gates, and how the next person triages it red. The capstone forces the chain to be coherent. Gaps you skated over in the module exercises show up here, where each piece has to connect to the next.

The brief¶

Title: Skyward Siege — a 4v4 online multiplayer shooter shipping on Win64, PS5, and Switch. Dedicated-server authoritative. Has an existing in-engine functional-test suite for single-player systems. No Gauntlet coverage yet.

Your charge: stand up the first Gauntlet coverage. Leadership wants two things gated by next milestone: (1) every platform's build boots, and (2) a smoke that two clients can connect to a dedicated server and complete a 60-second bot match. It must run nightly per platform and pre-submit (boot only) on Win64.

Work the lab in order. Each part names the module it draws on. Write your answers down — this is a design document, not a quiz.

Part A — Mental model & scope (M1)

State, in one sentence each, what Gauntlet will and will not do for this charge (cook? assert? launch?).
The team already has in-engine functional tests. For the boot requirement, do you need new game-side code? For the connect-and-match smoke, do you? Justify each from M1.1's boundaries and M1.2's decision rule.
Which built-in test(s) (M2.3) cover part of this charge with zero custom C#? Which part still needs a custom test, and why?

Part B — The run path (M2)

Write the RunUnreal command line for the Win64 boot gate. Mark every token as command / Gauntlet param / pass-through (M2.2), and flag any verify on a real build assumptions.
Pick the -configuration for the nightly smoke and justify it against marker visibility (M2.2, M5.3).
List the four result surfaces (M2.4) you'd check first when the smoke goes red, in order.

Part C — Sessions, roles, devices, builds (M3)

Draw the role layout for the smoke: roles, types, and the launch order (M3.1, M3.2).
Decide device placement for the nightly smoke on PS5 (same-kit vs. per-kit) and justify the trade-off (M3.3). What changes if you scale from local Win64 to PS5 kits?
Name the upstream artifacts each platform's run depends on and the build provenance line you'd attach to a result (M3.4).

Part D — Authoring (M4)

Sketch the custom smoke test's GetConfiguration(): roles, per-role args, MaxDuration (justify the number for the slowest target) (M4.2).
In words, describe its TickTest() success and failure conditions (M4.3). What's the late enough observable that means "match completed"?
Name one isolation rule (M4.4) this networked test must obey to survive the parallel nightly, and the flake it prevents.

Part E — Game side (M5)

Does the smoke need a TestController (M5.1)? Decide and justify against "is it already in the logs / does it need in-game action or un-logged state."
If yes, split the logic C# node vs. C++ controller (M5.2): who launches, who drives the bot match, who emits the completion marker, who sets the verdict.
Specify the assertion stack (M5.3): the completion marker (name + where emitted + verbosity for your chosen configuration), and whether a screenshot or perf capture earns its place here.

Part F — Pipeline & ownership (M6)

Place both tests in gates (M6.1): what does a red boot gate block pre-submit? what does a red nightly smoke block? Name what each prevents.
Write the retry policy (M6.2): which failures justify a retry, the cap, and how retries are recorded. State one failure you must never retry past.
Write the five-line triage runbook header the next on-call follows when the PS5 smoke is red.

Reference solution (sketch)¶

Read after you've drafted yours. This is a defensible design, not the only one; compare reasoning, not wording.

Part A — Mental model & scope

Will: launch each platform's cooked build, orchestrate the 1-server/2-client smoke session, detect crashes, collect logs/artifacts, report pass/fail. Won't: cook/stage the builds (upstream), and won't decide "passed" for you — you define it.
Boot: no new game-side code — a log-driven boot check suffices (boundary #2). Smoke: likely yes — "two clients completed a bot match" needs in-game sequencing/state not in default logs (M1.2 puts multi-process + in-game flow squarely in Gauntlet, and the in-game part wants a controller).
UE.BootTest per platform covers requirement (1) with zero custom C#; UE.EditorBootTest is a cheap pre-check. The connect-and-match smoke is the custom test — multi-role + in-game completion condition the built-ins don't express.

Part B — The run path

RunUAT.bat RunUnreal -test=UE.BootTest -project=SkywardSiege -platform=Win64 -configuration=Development -build=local — command RunUnreal; Gauntlet params -test/-project/-platform/-configuration/-build; no pass-through needed. Verify: -build token (local/path/id).
Test is the honest nightly target (close to shipping) provided your completion marker survives it; if the marker is Display/Verbose it's stripped (M5.3) — either raise its verbosity or run the smoke in Development and add a separate Test boot gate. Name the trade-off explicitly.
Exit code → crash/ensure summary → failing role's log → /Saved artifacts (M2.4).

Part C — Sessions, roles, devices, builds

1 dedicated server (hosts the bot map, ?listen), 2 clients (connect to server). Order: server up and listening → clients launch pointed at it (M3.2).
Per-kit is more honest (real network, per-device behavior) but costs 3 reservations; same-kit is cheaper and fine for a first logic pass. For the gating nightly, per-device is worth it for a networked title. Scaling from local Win64: the roles stay the same; the devices change (reservation, build copy, per-kit flake) and you now need cooked PS5 client/server builds (M3.3).
Upstream: compiled + cooked + staged client/server for each platform. Provenance: SkywardSiege · platform=PS5 · configuration=Test · staged · <build path/id> · UE 5.x (M3.4).

Part D — Authoring

GetConfiguration(): RequireRole(Server) with the bot-map + listen args; two RequireRole(Client); MaxDuration sized for the slowest target — a 60s match plus boot/load/ connect on Switch, e.g. 240–300s (justify, don't guess).
Pass: both clients emit the match-completion marker and no crash on any role. Fail: any crash/ensure, or timeout with the marker unseen. Late-enough observable: a marker fired at match end (e.g. Smoke_MatchComplete), not at match start or level load.
No hardcoded ports — take the server port from the framework's allocation (M4.4); prevents the green-solo/red-in-batch port collision when nightlies run concurrently.

Part E — Game side

Yes — running a bot match and detecting its completion is in-game sequencing/state, not in default logs (M5.1).
C# node: launch the 3 roles, set MaxDuration, watch for both clients' Smoke_MatchComplete markers, fold in crash detection, set the verdict. C++ controller (per client and/or server): start/observe the bot match, detect completion, emit the marker. Verdict stays in C# (M5.2) — only it sees all roles + crashes.
Marker Smoke_MatchComplete, emitted at match end at a verbosity present in the configuration under test (M5.3). A screenshot at completion is reasonable insurance against "completed but rendered black"; a perf capture is optional for a smoke (belongs in a dedicated perf test) — say so rather than bolting it on.

Part F — Pipeline & ownership

Pre-submit Win64 boot red → blocks the submit (M6.1). Nightly per-platform smoke red → marks that platform's nightly build not promotable / flags the CL. Boot also gates promotion.
Retry only on classified transient infra — pool/device unavailability, a known lab-network blip — capped (e.g. 1 retry), every retry logged and surfaced (a green-after-retry is reported as such). Never retry past a reproducible crash or an isolation/marker bug — that launders a real signal (M6.2).
Runbook header: (1) skipped or failed? grey → upstream cook/stage, stop. (2) crash summary first. (3) localize tier via symptom map (M1.3). (4) probes: other kits / solo vs batch / configuration-only / slow-target timeout / reproduces-clean. (5) classify + route: infra-or-test (you fix; retry/quarantine) vs. product (file with provenance + artifacts).

Capstone done when

[ ] Every part is answered as a connected design — Part D's marker is the one Part E emits and Part B asserts in a configuration where it survives.
[ ] Boot coverage uses a built-in; the smoke is a justified custom test with a controller.
[ ] The role layout, device choice, and build provenance are all specified.
[ ] Both tests are placed in named gates with a stated retry policy and a triage runbook.
[ ] Every version-specific assumption is flagged verify on a real build.

You've reached the end of the workbook. From here, the next step is a real source build: stand up UE.EditorBootTest on your project, then UE.BootTest, then port your Skyward Siege capstone design onto an actual .Automation project — verifying each flagged assumption as you go.

Back to Home · see the Glossary for any term.