M6.3 — Capstone lab¶
What you'll do
Design a boot + smoke Gauntlet test for a sample title, end to end — from the mental model through authoring to the pipeline gate and a triage runbook. This pulls every module together into one artifact you could hand to a team.
How it applies (QA)
Real Gauntlet ownership isn't writing one TickTest() — it's the whole chain: deciding what to
test, where it runs, what "passed" means, what it gates, and how the next person triages it red.
The capstone forces the chain to be coherent. Gaps you skated over in the module exercises show up
here, where each piece has to connect to the next.
The brief¶
Title: Skyward Siege — a 4v4 online multiplayer shooter shipping on
Win64,PS5, andSwitch. Dedicated-server authoritative. Has an existing in-engine functional-test suite for single-player systems. No Gauntlet coverage yet.Your charge: stand up the first Gauntlet coverage. Leadership wants two things gated by next milestone: (1) every platform's build boots, and (2) a smoke that two clients can connect to a dedicated server and complete a 60-second bot match. It must run nightly per platform and pre-submit (boot only) on
Win64.
Work the lab in order. Each part names the module it draws on. Write your answers down — this is a design document, not a quiz.
Part A — Mental model & scope (M1)
- State, in one sentence each, what Gauntlet will and will not do for this charge (cook? assert? launch?).
- The team already has in-engine functional tests. For the boot requirement, do you need new game-side code? For the connect-and-match smoke, do you? Justify each from M1.1's boundaries and M1.2's decision rule.
- Which built-in test(s) (M2.3) cover part of this charge with zero custom C#? Which part still needs a custom test, and why?
Part B — The run path (M2)
- Write the
RunUnrealcommand line for theWin64boot gate. Mark every token as command / Gauntlet param / pass-through (M2.2), and flag any verify on a real build assumptions. - Pick the
-configurationfor the nightly smoke and justify it against marker visibility (M2.2, M5.3). - List the four result surfaces (M2.4) you'd check first when the smoke goes red, in order.
Part C — Sessions, roles, devices, builds (M3)
- Draw the role layout for the smoke: roles, types, and the launch order (M3.1, M3.2).
- Decide device placement for the nightly smoke on
PS5(same-kit vs. per-kit) and justify the trade-off (M3.3). What changes if you scale from localWin64toPS5kits? - Name the upstream artifacts each platform's run depends on and the build provenance line you'd attach to a result (M3.4).
Part D — Authoring (M4)
- Sketch the custom smoke test's
GetConfiguration(): roles, per-role args,MaxDuration(justify the number for the slowest target) (M4.2). - In words, describe its
TickTest()success and failure conditions (M4.3). What's the late enough observable that means "match completed"? - Name one isolation rule (M4.4) this networked test must obey to survive the parallel nightly, and the flake it prevents.
Part E — Game side (M5)
- Does the smoke need a
TestController(M5.1)? Decide and justify against "is it already in the logs / does it need in-game action or un-logged state." - If yes, split the logic C# node vs. C++ controller (M5.2): who launches, who drives the bot match, who emits the completion marker, who sets the verdict.
- Specify the assertion stack (M5.3): the completion marker (name + where emitted + verbosity for your chosen configuration), and whether a screenshot or perf capture earns its place here.
Part F — Pipeline & ownership (M6)
- Place both tests in gates (M6.1): what does a red boot gate block pre-submit? what does a red nightly smoke block? Name what each prevents.
- Write the retry policy (M6.2): which failures justify a retry, the cap, and how retries are recorded. State one failure you must never retry past.
- Write the five-line triage runbook header the next on-call follows when the
PS5smoke is red.
Reference solution (sketch)¶
Read after you've drafted yours. This is a defensible design, not the only one; compare reasoning, not wording.
Part A — Mental model & scope
- Will: launch each platform's cooked build, orchestrate the 1-server/2-client smoke session, detect crashes, collect logs/artifacts, report pass/fail. Won't: cook/stage the builds (upstream), and won't decide "passed" for you — you define it.
- Boot: no new game-side code — a log-driven boot check suffices (boundary #2). Smoke: likely yes — "two clients completed a bot match" needs in-game sequencing/state not in default logs (M1.2 puts multi-process + in-game flow squarely in Gauntlet, and the in-game part wants a controller).
UE.BootTestper platform covers requirement (1) with zero custom C#;UE.EditorBootTestis a cheap pre-check. The connect-and-match smoke is the custom test — multi-role + in-game completion condition the built-ins don't express.
Part B — The run path
RunUAT.bat RunUnreal -test=UE.BootTest -project=SkywardSiege -platform=Win64 -configuration=Development -build=local— commandRunUnreal; Gauntlet params-test/-project/-platform/-configuration/-build; no pass-through needed. Verify:-buildtoken (local/path/id).Testis the honest nightly target (close to shipping) provided your completion marker survives it; if the marker isDisplay/Verboseit's stripped (M5.3) — either raise its verbosity or run the smoke inDevelopmentand add a separateTestboot gate. Name the trade-off explicitly.- Exit code → crash/ensure summary → failing role's log →
/Savedartifacts (M2.4).
Part C — Sessions, roles, devices, builds
- 1 dedicated server (hosts the bot map,
?listen), 2 clients (connect to server). Order: server up and listening → clients launch pointed at it (M3.2). - Per-kit is more honest (real network, per-device behavior) but costs 3 reservations;
same-kit is cheaper and fine for a first logic pass. For the gating nightly, per-device is
worth it for a networked title. Scaling from local
Win64: the roles stay the same; the devices change (reservation, build copy, per-kit flake) and you now need cookedPS5client/server builds (M3.3). - Upstream: compiled + cooked + staged client/server for each platform. Provenance:
SkywardSiege · platform=PS5 · configuration=Test · staged · <build path/id> · UE 5.x(M3.4).
Part D — Authoring
GetConfiguration():RequireRole(Server)with the bot-map + listen args; twoRequireRole(Client);MaxDurationsized for the slowest target — a 60s match plus boot/load/ connect onSwitch, e.g. 240–300s (justify, don't guess).- Pass: both clients emit the match-completion marker and no crash on any role. Fail:
any crash/ensure, or timeout with the marker unseen. Late-enough observable: a marker fired
at match end (e.g.
Smoke_MatchComplete), not at match start or level load. - No hardcoded ports — take the server port from the framework's allocation (M4.4); prevents the green-solo/red-in-batch port collision when nightlies run concurrently.
Part E — Game side
- Yes — running a bot match and detecting its completion is in-game sequencing/state, not in default logs (M5.1).
- C# node: launch the 3 roles, set
MaxDuration, watch for both clients'Smoke_MatchCompletemarkers, fold in crash detection, set the verdict. C++ controller (per client and/or server): start/observe the bot match, detect completion, emit the marker. Verdict stays in C# (M5.2) — only it sees all roles + crashes. - Marker
Smoke_MatchComplete, emitted at match end at a verbosity present in the configuration under test (M5.3). A screenshot at completion is reasonable insurance against "completed but rendered black"; a perf capture is optional for a smoke (belongs in a dedicated perf test) — say so rather than bolting it on.
Part F — Pipeline & ownership
- Pre-submit
Win64boot red → blocks the submit (M6.1). Nightly per-platform smoke red → marks that platform's nightly build not promotable / flags the CL. Boot also gates promotion. - Retry only on classified transient infra — pool/device unavailability, a known lab-network blip — capped (e.g. 1 retry), every retry logged and surfaced (a green-after-retry is reported as such). Never retry past a reproducible crash or an isolation/marker bug — that launders a real signal (M6.2).
- Runbook header: (1) skipped or failed? grey → upstream cook/stage, stop. (2) crash summary first. (3) localize tier via symptom map (M1.3). (4) probes: other kits / solo vs batch / configuration-only / slow-target timeout / reproduces-clean. (5) classify + route: infra-or-test (you fix; retry/quarantine) vs. product (file with provenance + artifacts).
Capstone done when
- [ ] Every part is answered as a connected design — Part D's marker is the one Part E emits and Part B asserts in a configuration where it survives.
- [ ] Boot coverage uses a built-in; the smoke is a justified custom test with a controller.
- [ ] The role layout, device choice, and build provenance are all specified.
- [ ] Both tests are placed in named gates with a stated retry policy and a triage runbook.
- [ ] Every version-specific assumption is flagged verify on a real build.
You've reached the end of the workbook. From here, the next step is a real source build: stand up
UE.EditorBootTest on your project, then UE.BootTest, then port your Skyward Siege capstone
design onto an actual .Automation project — verifying each flagged assumption as you go.