Prompt Engineering
for Complex Systems

Elective Techniques — One Valid Approach Among Options

13 of 25 techniques · Workshop format

What “elective” means here

These are genuine techniques. A system applying them is doing something sound.
But a well-designed system could reasonably choose otherwise — these aren’t defaults.
Apply them when the situation calls for it, not as a checklist.

The prescriptive deck covers the other 12 — the ones you reach for by default.

Sessions

Session 1 — Decision Shaping
Session 2 — Evaluation Frameworks
Session 3 — Audit Techniques
Session 4 — Evolutionary Systems
Session 5 — Architecture Patterns

13 Techniques — This Deck

2. Seed Decision Trees

3. Steering

4. Conceptual Anchoring

5. Operational Cost Framing

6. Symmetry Detection

8. Domain-Neutral Primitives

9. Layer-Crossing Checks

10. Escape Hatch Design

11. Constitutional Layering

14. Backlog Pressure

17. Supersession Protocol

20. Heartbeat Architecture

21. Structured Communication

Session 1 of 5

Decision Shaping

Technique 2 — Seed Decision Trees
Technique 3 — Steering
Technique 4 — Conceptual Anchoring

Technique 2

Seed Decision Trees

Pre-loading the model's reasoning path
with structured choices.

Three components of a seed tree

Without seed trees, the model's reasoning path is
determined by training distribution, not your task structure.

Methodology — the reasoning path
Stopping conditions — when to halt
Conditional routing — branching logic

Methodology: "Start with X → then Y → then Z"

Ordered sequence of reasoning steps.
The model follows your path instead of inventing one.

## Analysis methodology (follow this order)

1. Identify the system's core primitives (nouns that compose)
2. Map how primitives interact (dependencies, data flow)
3. For each interaction: does the interface contract hold?
4. Where contracts break: what's the operational cost?

DO NOT skip steps. DO NOT reorder. Step 3 requires step 2's output.

Ordering matters — later steps consume earlier steps' output. Explicit dependency prevents shortcuts.

Stopping Conditions

When to halt: saturation, repetition, threshold.

## Stop when:
- You've covered all public interfaces (completeness)
- You find >10 critical findings (triage-first)
- Remaining items are all severity:low (diminishing returns)
- You're repeating a finding you already noted (saturation)

## Do NOT stop because:
- "It's getting long" (length is not a stopping condition)
- "This seems thorough" (feeling ≠ completeness)
- You've reached some internal output limit

Models default to stopping when output "feels complete" — an artifact of training on human-length responses.

Conditional Routing

"If X, then Y; otherwise Z" — branching logic for the model's path.

## Routing rules

- If finding involves data loss → severity:critical, stop and report
- If finding is style-only → severity:low, batch at end
- If finding is ambiguous → include with confidence:medium
- If finding duplicates an earlier finding → merge, don't repeat

## Escalation
- If >3 critical findings: stop analysis, report immediately
- If finding contradicts the specification: flag as "spec conflict"
  and continue (don't resolve — surface for human decision)

Routing prevents flat-priority output where everything appears equally important.

Example: Seeded methodology

## Analysis methodology (follow this order)

1. Identify the system's core primitives (nouns that compose)
2. Map how primitives interact (dependencies, data flow)
3. For each interaction: does the interface contract hold?
4. Where contracts break: what's the operational cost?

## Stopping conditions
- Stop when you've covered all public interfaces
- Stop if you find >10 critical findings (triage first)
- Stop if remaining items are all severity:low

## Routing
- If finding involves data loss → severity:critical, stop and report
- If finding is style-only → severity:low, batch at end
- If uncertain → include but mark confidence:medium

Exercise

A PR review bot. Instructions: "Review this PR. Flag important issues." It produces a flat list mixing typos, style suggestions, logic errors, and security vulnerabilities — all presented equally important, in file order rather than priority order.

Discuss: What methodology would you seed? What stopping conditions prevent noise?
What routing rules separate severity tiers?

Some valid answers

A PR review bot. Instructions: "Review this PR. Flag important issues." It produces a flat list mixing typos, style suggestions, logic errors, and security vulnerabilities — all presented equally important, in file order rather than priority order.

Q1. What methodology would you seed?

Step 1: Check for security vulnerabilities (SQL injection, XSS, auth bypass, secret exposure). Step 2: Check for logic errors (off-by-ones, edge cases, incorrect conditionals). Step 3: Check for style and maintainability. Process in this order; do not skip to step 3 first.
The ordering matters: security issues block merge, so they must surface first. Style issues are low-priority suggestions — they should never bury a critical security finding earlier in the output.
A dependency constraint: "complete step 1 before step 2; complete step 2 before step 3. Each step's output is independent — a step 1 finding doesn't prevent step 2, but step 3 should not inflate a report that already has critical step 1 findings."

Some valid answers

A PR review bot. Instructions: "Review this PR. Flag important issues." It produces a flat list mixing typos, style suggestions, logic errors, and security vulnerabilities — all presented equally important, in file order rather than priority order.

Q2. What stopping conditions prevent noise?

Stop after 3 critical findings: if 3 or more security-critical issues exist, triage them first. Producing 15 more findings below them dilutes attention and delays the engineer getting to the critical items.
Stop when remaining items are all severity:low: if the only remaining issues are style suggestions, end the review and batch them separately. They don't belong in the same triage queue as logic errors.
Stop on saturation: "if you've seen this pattern 3 times, note the pattern once with examples — don't report the same finding 8 times across 8 instances. Repetition is a stopping signal."

Some valid answers

A PR review bot. Instructions: "Review this PR. Flag important issues." It produces a flat list mixing typos, style suggestions, logic errors, and security vulnerabilities — all presented equally important, in file order rather than priority order.

Q3. What routing rules separate severity tiers?

Security finding → severity:critical, block merge, report immediately. Logic error causing incorrect output → severity:high, request changes. Missing edge case → severity:medium, suggestion. Style inconsistency → severity:low, batch at end.
The routing rule prevents the flat-list default: the bot can no longer present "SQL injection in auth.py" and "inconsistent indentation in utils.py" at the same visual priority. The severity tier is assigned in the methodology, not inferred at output time.
A routing rule for ambiguity: "if severity is unclear, assign medium and explain why. Don't suppress uncertain findings — include them with confidence:medium so the reviewer can assess." This prevents the bot from silently dropping low-confidence findings.

Technique 3

Steering

Guide toward desired patterns
without over-constraining.

The constraint spectrum

Too loose: unpredictable output
↕
Steering: soft targets + escape hatches
↕
Too tight: brittle, breaks on edge cases

Templates with exits: "Use this format, BUT deviate when..."
Soft targets: "Typical range: 3-7" (not "exactly 5")
Disclosure preference: "When uncertain, include but mark it"

Example: Soft targets

## Output guidance (steering, not constraints)

Typical audits surface 3-7 findings. This is a soft target:
- Fewer is fine if the system is genuinely clean
- More is fine if genuine issues exist
- DON'T manufacture findings to hit the range
- DON'T suppress findings to stay under it

For each finding, typical length is 2-4 sentences.
Expand if the finding requires context to understand.
Contract if it's self-evident from the code snippet.

If you find yourself writing more than 10 findings,
pause and triage: are all genuinely actionable?

Exercise

A content moderation system for a developer forum. Rules: "Posts must be on-topic. Remove spam. Keep discussions professional." A post discusses using a competitor's product to solve a problem the community's product can't handle. Moderator bot flags it as "off-topic / competitive spam."

Discuss: What's a hard constraint vs. soft target here? What escape hatch is missing?
How do you steer without over-constraining?

Some valid answers

A content moderation system for a developer forum. Rules: "Posts must be on-topic. Remove spam. Keep discussions professional." A post discusses using a competitor's product to solve a problem the community's product can't handle. Moderator bot flags it as "off-topic / competitive spam."

Q1. What's a hard constraint vs. soft target here?

Hard constraint: "remove spam" — mechanically definable as repeated links, keyword stuffing, irrelevant promotions. This can and should be enforced, not steered.
Soft target: "on-topic" and "professional" — these are judgment calls. A post that's technical, solves a real problem, and mentions a competitor is on-topic by any reasonable definition but fails a naive "about our product" reading.
"Keep discussions professional" is a soft target too — professional behavior is contextual. A heated technical debate that stays technical is professional; a polite post that spreads misinformation isn't. Professional ≠ polite.

Some valid answers

A content moderation system for a developer forum. Rules: "Posts must be on-topic. Remove spam. Keep discussions professional." A post discusses using a competitor's product to solve a problem the community's product can't handle. Moderator bot flags it as "off-topic / competitive spam."

Q2. What escape hatch is missing?

The missing escape hatch: "mentioning a competitor is NOT spam when the mention is in the context of solving a technical problem. The question is whether the post helps the community; competitor mentions are incidental, not disqualifying."
A broader escape hatch structure: "if a post is borderline on the 'on-topic' criterion but clearly helps other developers with a real technical problem, lean toward allowing it. The goal is a useful community, not a promotional one."
Without the escape hatch, "on-topic" tightens over time as the model learns from its own moderation decisions. Each borderline case treated as off-topic trains the boundary tighter. The escape hatch preserves the intended breadth.

Some valid answers

A content moderation system for a developer forum. Rules: "Posts must be on-topic. Remove spam. Keep discussions professional." A post discusses using a competitor's product to solve a problem the community's product can't handle. Moderator bot flags it as "off-topic / competitive spam."

Q3. How do you steer without over-constraining?

Define the target as a range with escape: "posts are on-topic if they address a technical problem developers in this community face. Competitor mentions in that context are acceptable; posts whose primary purpose is competitor promotion are not."
Add a disclosure preference for borderline cases: "if a post is ambiguous, allow it and flag for human review rather than removing it. The cost of a false removal (chilling legitimate content) is higher than the cost of a false allowance."
Soft target with an example at each boundary: "on-topic example: [helpful competitor comparison]. Off-topic example: [pure competitor criticism with no technical content]. The rule is the range between these, not a point."

Technique 4

Conceptual Anchoring

Identify and reason from system primitives,
not surface patterns.

Primitives > patterns

Prevents hallucination: findings grounded in operational reality
Transfers across domains: same technique for code, prose, research
Enables composition: primitives compose; surface patterns don't

The three questions:

"What are the core primitives of this system?"
"How do these primitives interact?"
"Where does the map break down?" ← violations live here

Example: Anchoring an audit

## System primitives (your anchors)

This system has 4 primitive concepts:
- **User**: has roles, belongs to organizations
- **Document**: has owner, has visibility, has versions
- **Permission**: maps (user, document, action) → allow/deny
- **Audit trail**: immutable log of permission checks

## Your job: verify that every Document operation
## checks Permission before executing.

Anchor in Permission. For each Document endpoint:
1. Does it call Permission.check() before acting?
2. Does the Permission check match the action being taken?
3. Is the result logged to Audit trail?

Findings are grounded in these primitives — not in code style,
naming conventions, or aesthetic preferences.

Exercise

A CLI tool with 8 subcommands. Each subcommand has --help text. Most follow: Synopsis (usage), Description paragraph, Options list, Examples section. Two subcommands omit Examples. One has Synopsis + Examples only (no Description).

Discuss: What are the primitives? What's a violation?
What's justified divergence?

Some valid answers

A CLI tool with 8 subcommands. Each subcommand has --help text. Most follow: Synopsis (usage), Description paragraph, Options list, Examples section. Two subcommands omit Examples. One has Synopsis + Examples only (no Description).

Q1. What are the primitives?

A --help text entry has four primitives: Synopsis (what does this command do, one line), Description (why/when would you use it), Options (reference for each flag), Examples (how to use it in practice). These compose into a complete help entry.
Each primitive serves a different reader mode: Synopsis for scanning (what does this command exist), Description for evaluation (should I use this), Options for reference (what flags exist), Examples for learning (how do I actually use it).
The audit is: for each subcommand, which primitives are present? Which are missing? Is the omission intentional (the primitive genuinely doesn't apply) or incidental (someone forgot)?

Some valid answers

A CLI tool with 8 subcommands. Each subcommand has --help text. Most follow: Synopsis (usage), Description paragraph, Options list, Examples section. Two subcommands omit Examples. One has Synopsis + Examples only (no Description).

Q2. What's a violation?

The two subcommands missing Examples sections: if Examples serve the "learning" reader mode, their absence leaves that reader stranded. The violation is that users who want to know how to use the command have no reference to copy from.
The subcommand with Synopsis + Examples but no Description: depends on what the subcommand does. If "rm — removes a file" is self-explanatory from Synopsis, Description adds no value. If it's a complex command with edge cases, the Description omission is a violation.
Violation = a missing primitive that would have served a reader. Not-violation = a missing primitive that genuinely doesn't apply. The anchor is "does this reader need this?"

Some valid answers

A CLI tool with 8 subcommands. Each subcommand has --help text. Most follow: Synopsis (usage), Description paragraph, Options list, Examples section. Two subcommands omit Examples. One has Synopsis + Examples only (no Description).

Q3. What's justified divergence?

A "version" subcommand that shows the app version has no meaningful Examples section — "run `app version`" is the entire usage. The primitive doesn't apply; its absence is correct.
A complex subcommand might legitimately omit Description if the Synopsis is precise enough to convey both function and use-case. The test: can a new user read the Synopsis alone and know when they'd reach for this subcommand? If yes, Description is optional.
Divergence is justified when it's intentional and documented. The worst case is omission-by-accident versus omission-by-design: the audit should flag both, but the response differs. Accidental omissions get fixed; intentional ones get rationale added so future audits know not to flag them.

Session 2 of 5

Evaluation Frameworks

Technique 5 — Operational Cost Framing
Technique 6 — Symmetry Detection
Technique 8 — Domain-Neutral Primitives

Technique 5

Operational Cost Framing

Evaluate by impact, not by difference.
"What breaks?" not "what's different?"

Three cost dimensions

Operational cost: "What breaks if this persists?"
Cognitive cost: "What mental load does this add for readers?"
Future cost: "What maintenance trap does this create?"

Models default to noticing variation.
Cost framing forces evaluation of impact.

Reduces false positives from style differences.

Example: Cost-based severity

## Severity = cost, not deviation

CRITICAL (operational cost: system breaks)
- Missing null check on user input → crash in production
- Unvalidated redirect URL → open redirect vulnerability

HIGH (operational cost: data/security compromise)
- Hardcoded API key → secret exposure on public repo
- Missing rate limit → resource exhaustion possible

MEDIUM (cognitive cost: readers misled)
- Function name doesn't match behavior → debugging time
- Docs say X, implementation does Y → wrong assumptions

LOW (future cost: maintenance burden)
- Inconsistent naming → grep becomes unreliable
- Dead code → codebase noise, no runtime impact

NOT A FINDING (style variation, no cost)
- Tabs vs spaces (if no standard set)
- Comment style differences (if functionally equivalent)

Exercise

A shopping cart service audit flags: (1) Cart totals use floating-point math instead of integer cents. (2) "Remove item" button is labeled "Delete." (3) Cart doesn't persist across sessions. (4) Tax calculation rounds per-item instead of on the total.

Discuss: What's the operational cost of each? Which are critical vs. cosmetic?
What looks high-severity but isn't? What looks low but is actually critical?

Some valid answers

A shopping cart service audit flags: (1) Cart totals use floating-point math instead of integer cents. (2) "Remove item" button is labeled "Delete." (3) Cart doesn't persist across sessions. (4) Tax calculation rounds per-item instead of on the total.

Q1. What's the operational cost of each?

(1) Floating-point totals: penny rounding errors on every transaction, compounding over thousands of orders. Accounting reconciliation fails; financial audits flag discrepancies. Legal/regulatory exposure. CRITICAL.
(2) "Delete" vs. "Remove": cosmetic. Both are standard verbs for removing an item. No data integrity issue, no user confusion severe enough to cause loss. LOW.
(3) No session persistence: items lost when the user closes their browser or navigates away. High user frustration, potential cart abandonment. MEDIUM — operational cost is friction and lost conversions, not data corruption.

Some valid answers

A shopping cart service audit flags: (1) Cart totals use floating-point math instead of integer cents. (2) "Remove item" button is labeled "Delete." (3) Cart doesn't persist across sessions. (4) Tax calculation rounds per-item instead of on the total.

Q2. Which look high-severity but aren't? Which look low but are actually critical?

(3) Looks high (users complain loudly about losing carts) but is MEDIUM — the cost is friction and frustration, not money or correctness. User complaints are not the same as operational cost.
(4) Tax per-item rounding looks like a rounding preference, almost cosmetic. It's actually CRITICAL: tax rounding errors create legally incorrect invoices that fail compliance audits. The financial and legal cost is real and accumulating on every transaction.
The lesson: severity should be grounded in what breaks — legal/financial systems, data integrity, or security — not in how visible or complained-about the issue is. Loudly complained-about ≠ high operational cost.

Some valid answers

A shopping cart service audit flags: (1) Cart totals use floating-point math instead of integer cents. (2) "Remove item" button is labeled "Delete." (3) Cart doesn't persist across sessions. (4) Tax calculation rounds per-item instead of on the total.

Q3. Which would you fix first?

Fix (1) first: floating-point math is actively corrupting financial records on every transaction. Every order placed between now and the fix is a compounding error in the accounting system.
Fix (4) second: per-item tax rounding is a legal compliance issue on every invoice. The fix is low-effort (integer arithmetic with one rounding step at the total level) and the risk of not fixing it is regulatory.
(3) and (2) are lower priority — cart persistence is a user experience improvement, not a correctness fix, and the label change is cosmetic. They matter, but not before the financial integrity issues.

Technique 6

Symmetry Detection

Notice when patterns hold almost everywhere —
then investigate the breaks.

One-offs vs. broken symmetry

"8/10 endpoints validate input. 2 don't."

That's not two outliers. That's broken symmetry.
Either the 2 have justification, or they're bugs.

Quantify pattern adherence (not just "usually follows")
Check for documented justification on breaks
Assess cost of the asymmetry (back to technique 5)

Example: Symmetry audit

## Task: Identify broken symmetry in this API

For each pattern you observe:
1. How many endpoints follow it? (N/total)
2. Which endpoints break it? (list specifically)
3. For each break: is there documented justification?
4. If no justification: what's the operational cost?

## Example finding format:
Pattern: "All POST endpoints validate request body schema"
Adherence: 11/13 endpoints
Breaks: POST /legacy/import, POST /admin/bulk-update
Justification: None documented
Cost: Malformed input reaches business logic → potential crash
Severity: HIGH (broken symmetry with operational cost)

Exercise

A component library with 20 components. 17 accept a `size` prop with values "sm" | "md" | "lg". Of the remaining 3: one uses "small" | "medium" | "large", one has no size prop at all, and one accepts a numeric pixel value.

Discuss: What's the symmetry? Which breaks are justified?
What's the operational cost of each asymmetry? How would you report this?

Some valid answers

A component library with 20 components. 17 accept a `size` prop with values "sm" | "md" | "lg". Of the remaining 3: one uses "small" | "medium" | "large", one has no size prop at all, and one accepts a numeric pixel value.

Q1. What's the symmetry?

Pattern: 17/20 components accept a size prop with values "sm" | "md" | "lg". The adherence rate (85%) is high enough to call this a canonical convention for the library.
The symmetry extends beyond just the values: 17 components share the same prop name ("size"), the same enum type, and the same three values. This is a strong interface contract that the three exceptions break in different ways.
The audit question is: for each break, is the deviation intentional (justified by the component's nature) or incidental (someone used different naming without thinking)? The answer is different for each of the three.

Some valid answers

A component library with 20 components. 17 accept a `size` prop with values "sm" | "md" | "lg". Of the remaining 3: one uses "small" | "medium" | "large", one has no size prop at all, and one accepts a numeric pixel value.

Q2. Which breaks are justified?

"small"/"medium"/"large" — unjustified. This is the same concept with different names. The operational cost is that TypeScript autocomplete diverges, documentation is inconsistent, and a developer writing a themed wrapper has to handle two naming conventions.
No size prop at all — possibly justified. A divider or decorator component might genuinely have one fixed presentation. The audit should flag it for review, not as a definite violation. The question is: "should this component ever render at different sizes?"
Numeric pixel value — possibly justified for specific components. A spacer or layout primitive that needs arbitrary sizing can't be constrained to three enum values. The justification should be documented ("this component accepts arbitrary spacing, not discrete size tiers") so future developers understand the intentional divergence.

Some valid answers

A component library with 20 components. 17 accept a `size` prop with values "sm" | "md" | "lg". Of the remaining 3: one uses "small" | "medium" | "large", one has no size prop at all, and one accepts a numeric pixel value.

Q3. What's the operational cost? How would you report this?

"small"/"medium"/"large" break: developers using the library must memorize two size APIs. TypeScript doesn't catch mismatches (both are valid strings). Grep for "size=" returns inconsistent results. Rename it in the fix — the cost of the inconsistency accumulates with every consumer.
How to report it: "Pattern: 17/20 components use size: 'sm' | 'md' | 'lg'. Break: ComponentX uses 'small' | 'medium' | 'large' (no documented justification). Operational cost: dual API surface for consumers. Recommended fix: align to 'sm'|'md'|'lg'."
The report format matters: don't just list the breaks — quantify adherence (17/20), identify the break, check for justification, and assess cost. A report that says "3 inconsistencies found" is less useful than one that distinguishes two unjustified naming divergences from one intentional design choice.

Technique 8

Domain-Neutral Primitives

Design prompts that work across artifact types:
code, prose, research, systems.

Abstract > specific

Use "conceptual units" not "modules" or "paragraphs"
Use "composition boundary" not "function interface" or "section break"
Use "coherence" not "code correctness" or "logical flow"

Test: can your prompt audit both a CLI tool
AND a research paper without modification?

Same auditor for software, papers, component libraries, organizational docs.

Example: Domain-neutral language

## Audit framework (works across domains)

For any artifact, identify:
1. **Units**: the composable pieces (functions/paragraphs/components)
2. **Boundaries**: where units meet (interfaces/transitions/edges)
3. **Contracts**: what each boundary promises (types/claims/invariants)
4. **Coherence**: do units compose without contradiction?

## Apply to:
- Source code: units=functions, boundaries=APIs, contracts=types
- Research paper: units=sections, boundaries=transitions, contracts=claims
- Design system: units=components, boundaries=props, contracts=variants

The audit technique is identical. Only the vocabulary maps change.

Exercise

A multi-tenant SaaS audit prompt says: "Check that tenant data doesn't leak between organizations. Review database queries for proper WHERE clauses. Verify API endpoints check org_id."

Discuss: What's domain-specific language here? How would you rephrase to also audit
a document-sharing system's permission model? A component library's theme isolation?
What vocabulary transfers across all three?

Some valid answers

A multi-tenant SaaS audit prompt says: "Check that tenant data doesn't leak between organizations. Review database queries for proper WHERE clauses. Verify API endpoints check org_id."

Q1. What's domain-specific language here?

"Tenant data," "organizations," "WHERE clauses," "org_id" — all specific to multi-tenant SaaS data architecture. None of these terms apply to a document-sharing system or a component library.
"Database queries" is somewhat domain-neutral but presupposes a relational data model. A document-sharing system with a different persistence layer needs different vocabulary for the same underlying concept.
The domain-specific framing constrains the prompt to one class of systems — it can't be reused without rewriting, even when the underlying principle (scope isolation) is identical across all three systems.

Some valid answers

A multi-tenant SaaS audit prompt says: "Check that tenant data doesn't leak between organizations. Review database queries for proper WHERE clauses. Verify API endpoints check org_id."

Q2. How would you rephrase to audit all three?

Multi-tenant SaaS: "scope" = org, "isolation boundary" = data that belongs to one org must not be accessible to another, "boundary check" = every data access is scoped to the requesting org's identifier.
Document-sharing: "scope" = user + permission set, "isolation boundary" = a document visible to user A must not be accessible to user B without a permission grant, "boundary check" = every document access verifies the requester's permissions.
Component library theme isolation: "scope" = component instance, "isolation boundary" = a component's theme tokens must not bleed into sibling components, "boundary check" = CSS scoping or prop encapsulation prevents style leakage across component boundaries.

Some valid answers

A multi-tenant SaaS audit prompt says: "Check that tenant data doesn't leak between organizations. Review database queries for proper WHERE clauses. Verify API endpoints check org_id."

Q3. What vocabulary transfers across all three?

"Scope" (the unit that defines what a principal can access), "isolation boundary" (where scope A must not cross into scope B), "boundary check" (the mechanism that enforces the boundary), and "crossing detection" (how you verify that boundary checks are happening).
The transferable one-sentence audit: "Verify that every access to a resource in scope A checks the requester's right to that scope before granting access." This applies to tenants, documents, and theme tokens without modification.
Domain-specific vocabulary (tenant, org_id, WHERE clause) can be added as examples: "In a multi-tenant DB, this check is a WHERE org_id = :current_org; in a document system, it's a permission.check(user, document); in a component library, it's CSS encapsulation." The principle is shared; the vocabulary maps are per-domain footnotes.

Session 3 of 5

Audit Techniques

Technique 9 — Layer-Crossing Checks
Technique 10 — Escape Hatch Design

Technique 9

Layer-Crossing Checks

Verify alignment across abstraction layers.
Most bugs live at boundaries.

Where integrity violations hide

Documentation → says "validates all input"
↕ BOUNDARY (check here)
Implementation → skips validation on 2 endpoints

Schema → defines field as required
↕ BOUNDARY (check here)
Usage → passes null without error

Single-layer audits miss drift between layers.
Anchor in one layer, validate in another.

Boundary: Docs ↔ Code

Documentation says one thing. Implementation does another.

## Docs ↔ Code verification

For each documented feature:
1. Find the claim in documentation (README, API docs, comments)
2. Locate the implementation
3. Verify: does the implementation honor the documented behavior?

Common drift patterns:
- Docs describe v1 behavior; code has been patched to v3
- Docs promise error handling that doesn't exist
- Docs say "required" but code has a default fallback
- Feature was removed but docs still reference it

Docs ↔ Code drift is the most common and least detected. Tests verify code, not docs.

Boundary: Schema ↔ Usage

Schema defines a contract. Callers violate it silently.

## Schema ↔ Usage verification

For each schema definition:
1. Read the schema constraints (required fields, types, enums)
2. Find all call sites / consumers
3. Verify: does every caller satisfy the schema?

Red flags:
- Schema says "required" but 3 callers pass undefined
- Schema says enum ["draft","published"] but code also uses "archived"
- Schema says maxLength:100 but no truncation at input boundary
- Schema updated, but callers unchanged (version skew)

Schemas are promises. Code that bypasses the validation layer breaks promises silently.

Boundary: Claim ↔ Evidence

System makes an assertion. Evidence doesn't support it.

## Claim ↔ Evidence verification

For each system claim (logs, metrics, status pages, user-facing messages):
1. Identify the claim ("99.9% uptime", "processed successfully", "validated")
2. Trace to the evidence source
3. Verify: does the evidence actually demonstrate the claim?

Failure modes:
- "Processed" means "received" — not "completed"
- "Validated" means "parsed" — not "business-rule checked"
- "Healthy" means "responding" — not "correct"
- Metric measures attempts, claim reports "successes"

Most dangerous in LLM systems: "the model confirmed" ≠ "the answer is correct."

Example: Cross-layer checklist

Prompt literal — a verification framework

## Layer-crossing verification

For each feature, check these boundaries:

| Source layer | Target layer | Question |
|---|---|---|
| README | Implementation | Does code do what docs claim? |
| Schema | Usage | Are required fields always provided? |
| Error messages | Error handling | Do messages match actual error types? |
| API docs | Route handlers | Do params/returns match? |
| Config defaults | Runtime behavior | Does the default actually apply? |

Process: Pick source layer. Read its claims. Cross to target
layer. Verify each claim has matching implementation.
Mismatches at boundaries are findings.

Exercise

An API has an OpenAPI spec (doc layer) and Express route handlers (implementation layer). Spec says PUT /users/:id requires `{ name, email }` in body. Handler actually accepts `{ name, email, role }` — and setting role=admin works, bypassing the invitation flow.

Discuss: What layers exist here? What's the boundary violation?
What's the operational cost? What other layer-crossings would you check in this system?

Some valid answers

An API has an OpenAPI spec (doc layer) and Express route handlers (implementation layer). Spec says PUT /users/:id requires `{ name, email }` in body. Handler actually accepts `{ name, email, role }` — and setting role=admin works, bypassing the invitation flow.

Q1. What layers exist here?

Documentation layer: the OpenAPI spec — the public contract for what the API accepts and returns. Authorization layer: the invitation flow — the business rule that controls how role=admin is granted. Implementation layer: the Express handler — what the code actually does.
A fourth layer exists implicitly: the test suite. If tests were written against the OpenAPI spec, they would not test for role=admin in the request body — because the spec doesn't document it. The vulnerability lives below the test floor.
The vulnerability is an undocumented layer-crossing: the implementation accepts input the spec doesn't acknowledge, and that input bypasses an authorization layer the spec implies is the only path to elevated roles.

Some valid answers

An API has an OpenAPI spec (doc layer) and Express route handlers (implementation layer). Spec says PUT /users/:id requires `{ name, email }` in body. Handler actually accepts `{ name, email, role }` — and setting role=admin works, bypassing the invitation flow.

Q2. What's the boundary violation?

The spec says 2 fields; the handler accepts 3. The undocumented field is not just an inconsistency — it has real authorization consequences. Setting role=admin bypasses the invitation flow, granting elevated access without the intended review step.
This is a docs↔code violation with security impact: the spec is the source of truth for what the API accepts, and the implementation silently extends it with a privilege escalation surface. Anyone reading only the spec would never find this vulnerability.
The authorization layer (invitation flow) has a claimed guarantee: "admin access requires an invitation." The implementation makes that guarantee false without being visible at the authorization layer. That's a claim↔evidence violation nested inside the docs↔code violation.

Some valid answers

An API has an OpenAPI spec (doc layer) and Express route handlers (implementation layer). Spec says PUT /users/:id requires `{ name, email }` in body. Handler actually accepts `{ name, email, role }` — and setting role=admin works, bypassing the invitation flow.

Q3. What other layer-crossings would you check?

Error responses: does the spec document the error format for 400/401/403 responses? Do the handlers return exactly that format, or do some return raw Express error objects with stack traces?
Rate limiting: if the spec documents rate limits (429 after N requests), do the middleware enforce them? Or does the spec describe a protection the implementation doesn't actually provide?
Authentication requirements: for each endpoint the spec marks as "authenticated," does the middleware chain actually enforce authentication? A missing authMiddleware in one route handler would be invisible from the spec layer but exploitable at the implementation layer.

Technique 10

Escape Hatch Design

Build flexibility into rigid templates
without losing structure.

Hard constraints + permission to break them

"Follow this template OR explain why you didn't"
>
"Follow this template" (no escape)

Edge cases always emerge
Models need permission to deviate when justified
Deviation with explanation > silent rule-breaking
Deviation with explanation > forced compliance that distorts output

Example: Template with exits

## Output format

For each finding, use:
### [SEVERITY] Finding title
**Location:** file:line
**Evidence:** [code snippet or quote]
**Impact:** [operational cost]
**Fix:** [specific remediation]

## Escape hatches
- If finding spans multiple files: use "Location: [list]"
  instead of single file:line
- If evidence requires >10 lines: summarize + reference
  the full span ("see lines 42-89")
- If fix is non-obvious: replace "Fix:" with "Options:"
  and list 2-3 alternatives with tradeoffs
- If you can't determine severity: use "UNCERTAIN" and explain
  what information would resolve it

Exercise

A form builder requires every field to have: a label, a validation rule, a help text tooltip, and an error message. An engineer adds a "spacer" element (pure layout, takes no input). System rejects it — no validation rule, no error message.

Discuss: What are the template primitives? What's the violation?
What escape hatch accommodates the spacer without weakening the contract for real input fields?

Some valid answers

A form builder requires every field to have: a label, a validation rule, a help text tooltip, and an error message. An engineer adds a "spacer" element (pure layout, takes no input). System rejects it — no validation rule, no error message.

Q1. What are the template primitives?

For input fields: label (what is this field?), validation rule (what values are accepted?), help text (tooltip explaining the expected input), error message (what to show when validation fails). All four serve user interaction with an input element.
The template assumes a one-to-one mapping: every element in the form IS an input field. The primitives were derived from the common case and generalized incorrectly to "all elements."
The spacer element is a layout element, not an input element. It doesn't accept user input, has no validation state, and needs none of the four primitives. The constraint is correct for its intended scope; the error is over-application of that scope.

Some valid answers

A form builder requires every field to have: a label, a validation rule, a help text tooltip, and an error message. An engineer adds a "spacer" element (pure layout, takes no input). System rejects it — no validation rule, no error message.

Q2. What's the violation?

The system rejects a valid use case: a spacer element is a legitimate form element (many form builders need layout control) but the validation contract prevents it from being expressed. The system is more rigid than the problem requires.
The violation is in the scope assumption, not the contract itself. "All form elements must have label + validation + help + error" is correct for input elements. Applying it to all elements conflates element types.
A type-blind validation contract cannot distinguish between "missing required field" (violation) and "element type doesn't need this field" (legitimate). Both look identical to the validator.

Some valid answers

A form builder requires every field to have: a label, a validation rule, a help text tooltip, and an error message. An engineer adds a "spacer" element (pure layout, takes no input). System rejects it — no validation rule, no error message.

Q3. What escape hatch accommodates the spacer?

Introduce element types: type: "input" | "layout". Input elements require all four properties; layout elements require none. This strengthens the contract for input elements (it now applies specifically and unambiguously) while accommodating layout elements.
The escape hatch is the type distinction, not the relaxation of constraints. The rules for input fields are unchanged — in fact, they're now clearer because the scope is explicit. The validator can enforce them more strictly on elements it knows are input fields.
An alternative escape hatch: allow elements to declare which primitives apply to them (required_primitives: ["label", "validation"]) with the default being all four. This is more flexible but harder to reason about — the type-based approach is cleaner when the element categories are stable.

Session 4 of 5

Evolutionary Systems

Technique 11 — Constitutional Layering
Technique 17 — Supersession Protocol
Technique 14 — Backlog Pressure

Technique 11

Constitutional Layering

Prompts as stratified documents where later layers
amend earlier ones with explicit precedence.

Why flat prompts break

Flat prompt

Every sentence equally weighted
Conflicts unresolvable
Must rewrite to amend
No maintenance path

→

Layered prompt

Explicit precedence hierarchy
Later layers override earlier
Add amendments, don't edit
Model reasons about intent

Example: Layered system prompt

Prompt literal — the system prompt itself

## Layer 0 — Identity (immutable)
You are a code reviewer specializing in security.

## Layer 1 — Governance (rarely changes)
Report findings with severity: critical > high > medium > low.
Never suppress a finding to reduce noise.

## Layer 2 — Session override (supersedes Layer 1)
UPDATED 2024-03: For this codebase, suppress medium/low
findings in test files. Rationale: test noise was obscuring
critical production findings. Layer 1 still applies to src/.

The model sees the evolution. It knows WHY the override exists.

Exercise

A customer support chatbot. Original prompt: "Be helpful, resolve issues quickly." After complaints about over-promising refunds, amendment added: "Don't promise refunds over $50 without escalation." Three months later, holiday override: "$50 limit raised to $100 for loyalty-tier customers."

Discuss: What are the layers? What happens if you edit the original instead of amending?
How does a new instance understand the policy history?

Some valid answers

A customer support chatbot. Original prompt: "Be helpful, resolve issues quickly." After complaints about over-promising refunds, amendment added: "Don't promise refunds over $50 without escalation." Three months later, holiday override: "$50 limit raised to $100 for loyalty-tier customers."

Q1. What are the layers?

Layer 1 (original): "Be helpful, resolve issues quickly." Layer 2 (amendment): "$50 refund escalation threshold." Layer 3 (override): "$100 threshold for loyalty-tier customers during holidays.
Each layer has a scope: the original applies universally, the amendment narrows the refund case specifically, the holiday override is time-scoped and tier-scoped.
The override is a layer, not an edit — it coexists with the amendment. On January 2nd, Layer 3 should expire, leaving Layer 2 intact.

Some valid answers

A customer support chatbot. Original prompt: "Be helpful, resolve issues quickly." After complaints about over-promising refunds, amendment added: "Don't promise refunds over $50 without escalation." Three months later, holiday override: "$50 limit raised to $100 for loyalty-tier customers."

Q2. What happens if you edit the original instead of amending?

The reasoning trail disappears — a new instance sees "$100 for loyalty customers" but doesn't know it started at $50, or why, or that the change was seasonal.
Silent regression risk: the next person to edit thinks $100 is the permanent policy and might remove the loyalty-tier qualifier when "cleaning up."
The original "be helpful, resolve quickly" framing is gone — future instances lose the governing principle that justified all the exceptions.

Some valid answers

A customer support chatbot. Original prompt: "Be helpful, resolve issues quickly." After complaints about over-promising refunds, amendment added: "Don't promise refunds over $50 without escalation." Three months later, holiday override: "$50 limit raised to $100 for loyalty-tier customers."

Q3. How does a new instance understand the policy history?

With amendments in place: the new instance sees each layer with its rationale and scope — it can reconstruct the intent behind each rule and the order of precedence.
Without amendments (edited original): the new instance has one instruction with no context on why it evolved, what exceptions apply to what cases, or which parts are permanent vs. seasonal.
The holiday override should carry an explicit expiry: "applies Dec 1–Jan 1; reverts to Layer 2 after." A new instance reading it mid-February then knows the current state unambiguously.

Technique 17

Supersession Protocol

Amend, never edit.
New instructions coexist with originals.

Why edits are dangerous

Lost reasoning: why was the original written? Gone.
Silent regression: future editors don't know what changed or why
Model confusion: edited prompts can be internally inconsistent without anyone noticing
No rollback: if the change was wrong, the original is lost

Supersession = amendments, not edits.
The model benefits from seeing the evolution.

Example: Amendment with rationale

## Rule 3 (2024-01)
Always include a confidence score (0-100) with each finding.

## Rule 3a — SUPERSEDES Rule 3 for batch mode (2024-03)
In batch mode (>50 items), omit individual confidence scores.
Instead, provide aggregate confidence at the batch level.

Rationale: individual scores on 200+ items created noise that
obscured the summary. The per-item scores are still computed
internally — they feed the aggregate. Not lost, just hidden.

Rule 3 still applies to single-item and small-batch (<50) mode.

Exercise

A frontend style guide. Version 1 (January): "All buttons use border-radius: 4px." Six months later, someone edits it to: "All buttons use border-radius: 8px." No history of why. A new developer asks: "Why 8px? The mockups from Q1 show 4px."

Discuss: What was lost? How would supersession preserve it?
What are the primitives of a style guide entry that can evolve safely?

Some valid answers

A frontend style guide. Version 1 (January): "All buttons use border-radius: 4px." Six months later, someone edits it to: "All buttons use border-radius: 8px." No history of why. A new developer asks: "Why 8px? The mockups from Q1 show 4px."

Q1. What was lost?

The original value (4px) and its rationale. Why was 4px chosen? Was it a design decision, a technical constraint, a legacy default? Without the history, no one knows if 8px is an improvement or a regression.
The scope of the change. Did 8px replace 4px everywhere, or only for certain button types? An in-place edit can't express "this applies to primary buttons; secondary buttons still use 4px."
The author and date. When did this change? Who decided? The new developer's Q1 mockups show 4px — was the style guide updated before or after those mockups? Without a timestamp, the question is unanswerable.

Some valid answers

A frontend style guide. Version 1 (January): "All buttons use border-radius: 4px." Six months later, someone edits it to: "All buttons use border-radius: 8px." No history of why. A new developer asks: "Why 8px? The mockups from Q1 show 4px."

Q2. How would supersession preserve it?

The supersession entry would say: "border-radius: 8px for primary action buttons (updated 2024-06). Supersedes 4px from January 2024 for this button type only. Rationale: accessibility audit found 4px corners too subtle at small sizes. 4px still applies to icon-only buttons."
The original entry stays in the document. A new developer reading it sees both: the original principle (4px, the default) and the exception (8px, primary buttons, accessibility-motivated). The Q1/current discrepancy is immediately explained.
Future editors know the reasoning. If the next accessibility audit changes the guidance again, they can supersede the 8px entry with a new one, linking the full chain of decisions.

Some valid answers

A frontend style guide. Version 1 (January): "All buttons use border-radius: 4px." Six months later, someone edits it to: "All buttons use border-radius: 8px." No history of why. A new developer asks: "Why 8px? The mockups from Q1 show 4px."

Q3. What are the primitives of a style guide entry that can evolve safely?

Value (the current rule), scope (which components it applies to), rationale (why this value, not another), date (when it was set), and supersedes (what it replaced, if anything).
An entry without rationale can be changed for any reason, since there's no stated reason to contradict. Rationale is what makes a rule defensible and makes future changes deliberate rather than casual.
Scope is the most often omitted. "All buttons use border-radius: X" looks like a universal rule. "Primary action buttons use X; icon-only buttons use Y" is a scoped rule that can evolve independently per scope.

Technique 14

Backlog Pressure

Structural alternatives to "please be thorough" —
systems where incomplete output triggers re-execution.

Making completeness structural

"Be thorough" → aspirational → unreliable

vs.

Output validator → incomplete? → re-invoke → repeat
↓
The model learns: incomplete = more work, not escape

Surface the loop to the model.
"You will be re-invoked if criteria X isn't met."

Example: Validator loop

# Pseudo-code: structural thoroughness
requirements = load_requirements()  # 7 items

while True:
    output = model.generate(prompt, requirements, gaps)

    # Validator checks each requirement
    gaps = validator.check(output, requirements)

    if not gaps:
        break  # genuinely complete

    # Feed gaps back — model sees what it missed
    prompt += f"\nINCOMPLETE. Missing: {gaps}\n"
    prompt += "Address these specifically before proceeding."

The prompt itself tells the model: "You will be re-invoked if incomplete."

Exercise

A ticket triage bot processes incoming bug reports. It tags each with severity, assigns to a team, and writes a summary. After processing 50 tickets: 8 have tags but no assignment, 3 have assignment but no summary. Bot reports "50 tickets processed."

Discuss: What are the primitives of "processed"? How would you make completeness structural?
What loop would catch the partial outputs?

Some valid answers

A ticket triage bot processes incoming bug reports. It tags each with severity, assigns to a team, and writes a summary. After processing 50 tickets: 8 have tags but no assignment, 3 have assignment but no summary. Bot reports "50 tickets processed."

Q1. What are the primitives of "processed"?

"Processed" has three sub-steps: (1) severity tagged, (2) team assigned, (3) summary written. The bot counts tickets consumed (received), not tickets where all three sub-steps completed.
The completion predicate is: tag IS NOT NULL AND assignee IS NOT NULL AND summary IS NOT NULL. A ticket satisfying only one or two conditions is partially processed, not done.
The 50-ticket "processed" count conflates "I looked at each" with "I fully handled each." These are different claims; the current system makes only the weaker one.

Some valid answers

A ticket triage bot processes incoming bug reports. It tags each with severity, assigns to a team, and writes a summary. After processing 50 tickets: 8 have tags but no assignment, 3 have assignment but no summary. Bot reports "50 tickets processed."

Q2. How would you make completeness structural?

After processing each ticket, run a validator: validate(ticket) → missing_fields. If missing_fields is non-empty, the ticket returns to the queue. The bot cannot mark it done until the validator passes.
Change the output contract: instead of "N tickets processed," report "N tickets processed, M complete, K pending (missing fields listed per ticket)." The incomplete tickets are the backlog.
The structural guarantee is: the bot's "complete" claim is gated on the validator, not on the bot's self-assessment. Completeness is externally tested, not internally declared.

Some valid answers

A ticket triage bot processes incoming bug reports. It tags each with severity, assigns to a team, and writes a summary. After processing 50 tickets: 8 have tags but no assignment, 3 have assignment but no summary. Bot reports "50 tickets processed."

Q3. What loop would catch the partial outputs?

After the initial processing pass, filter for tickets where complete = false. Re-invoke the bot with only those tickets and the specific missing fields listed. The bot knows exactly what's needed.
The loop terminates when the validator finds zero incomplete tickets — not when the bot runs out of input. This is backlog pressure: incomplete output triggers re-execution, not a new batch.
An escape hatch: if a ticket is incomplete after N re-invocations, escalate to human review rather than looping forever. The loop has a stopping condition that isn't "all done."

Session 5 of 5

Architecture Patterns

Technique 20 — Heartbeat Architecture
Technique 21 — Structured Communication

Technique 20

Heartbeat Architecture

Design for iteration cycles,
not monolithic outputs.

Why cycles beat monoliths

Monolithic

One shot to get it right
Failure = total restart
Progress invisible until done
Context window = hard ceiling

Cycled

Each cycle self-contained
Failure = redo one cycle
Progress visible at each step
Total work exceeds context

The model can do more total work across many small cycles
than in one large response.

Cycle state contract

Illustration — the shape of a cycle contract

## Cycle N reads:
- progress.json: what's been done (items 1-4 complete)
- findings.json: accumulated output so far
- task.md: the full task specification (cold-readable)

## Cycle N produces:
- Process items 5-6 (scope: exactly 2 items per cycle)
- Append results to findings.json
- Update progress.json: items 1-6 complete, 7-10 remaining

## Cycle exit condition:
- progress.json shows all items complete → stop
- Otherwise → invoke cycle N+1

## Invariant: each cycle's output is usable alone.
No cycle produces "partial results that need later cycles
to make sense." If the system crashes after cycle 3,
findings.json contains valid, complete results for items 1-6.

Exercise

A document indexer that processes 10,000 PDFs into a search index. Current implementation: one function call that processes all 10,000, takes 4 hours, and if it fails at PDF 8,500 — starts over from PDF 1.

Discuss: What are the cycle primitives? What state persists between heartbeats?
What's the invariant that makes partial progress usable? How many items per cycle?

Some valid answers

A document indexer that processes 10,000 PDFs into a search index. Current implementation: one function call that processes all 10,000, takes 4 hours, and if it fails at PDF 8,500 — starts over from PDF 1.

Q1. What are the cycle primitives?

Batch size (how many PDFs per cycle), progress cursor (which PDF number was last successfully indexed), and an output artifact (the partial index built so far). Each cycle reads the cursor, processes one batch, and writes both the cursor update and the new index entries.
The cycle exit condition: cursor == total_count. Until then, the system invokes the next cycle. This is structurally analogous to a while loop where the state lives outside the function.
An optional cycle-level result: summary of what was indexed in this batch (count, any errors, any skips). This surfaces problems per-batch rather than at the end of a 4-hour run.

Some valid answers

A document indexer that processes 10,000 PDFs into a search index. Current implementation: one function call that processes all 10,000, takes 4 hours, and if it fails at PDF 8,500 — starts over from PDF 1.

Q2. What state persists between heartbeats?

The progress cursor: "last successfully indexed PDF = 8,499." If the cycle crashes at 8,500, the next run starts at 8,500, not at 1.
The partial index itself: the search index built so far, covering PDFs 1 through the cursor. This is both an output artifact and input to the next cycle (it gets appended to, not rebuilt).
Error log: PDFs that failed indexing (corrupt file, unsupported format). These accumulate across cycles as a separate list for human review, separate from the main progress cursor.

Some valid answers

A document indexer that processes 10,000 PDFs into a search index. Current implementation: one function call that processes all 10,000, takes 4 hours, and if it fails at PDF 8,500 — starts over from PDF 1.

Q3. What's the invariant? How many items per cycle?

The invariant: after cycle N, the search index contains valid, queryable entries for PDFs 1 through N×batch_size. A user can search the partial index before the full run completes and get correct results for the indexed portion.
Batch size tradeoff: smaller batches (10-50) make restarts cheaper but add per-batch overhead. Larger batches (500+) reduce overhead but mean more work lost on failure. 100-200 PDFs per cycle balances both concerns well for a 10,000-item corpus.
The key test for batch size: if this cycle crashes partway through, how much re-work do you do on restart? If the answer is "less than 5 minutes of processing," the batch size is calibrated correctly for reliability.

Technique 21

Structured Communication

Metadata on messages enables routing
without reading the body.

Attention as a limited resource

10 input messages, equal formatting
↓
Model reads all 10 fully to determine priority
↓
Context budget spent on low-priority items
↓
High-priority items get less attention than they need

Structured headers = attention routing signals.

Example: Metadata-driven triage

Example input — messages the system consumes

---
from: security-scanner
to: code-reviewer
kind: request        # requires action (vs. "report" = FYI)
priority: critical
subject: SQL injection in auth.py line 42
---
[full finding details below...]

---
from: style-checker
to: code-reviewer
kind: report         # informational, no action required
priority: low
subject: 3 formatting inconsistencies in utils.py
---
[details...]

Prompt literal — the triage instruction

Process `request` messages first. `report` messages are
informational — acknowledge but don't block on them.

Exercise

A monitoring system sends alerts to an on-call engineer. All alerts use identical formatting: subject line + body text. Last week: engineer received 47 alerts in 2 hours, triaged sequentially, and missed a critical database failure buried as alert #31.

Discuss: What metadata would enable routing without reading the body?
What are the primitives of a triageable alert? What's the cost of uniform formatting?

Some valid answers

A monitoring system sends alerts to an on-call engineer. All alerts use identical formatting: subject line + body text. Last week: engineer received 47 alerts in 2 hours, triaged sequentially, and missed a critical database failure buried as alert #31.

Q1. What metadata would enable routing without reading the body?

Severity (critical/high/low/info), service (which system generated the alert), kind (action-required vs. informational), and dedup-key (is this the same alert repeating?). These four fields let you triage 47 alerts in seconds without reading bodies.
A timestamp and a "first-seen" flag: if an alert has been firing for 30 seconds, that's different from one that's been firing for 2 hours. Duration metadata changes the urgency calculation without body inspection.
A "correlated alerts" field: "this alert is related to alert #28 (database failure)" allows the engineer to see that #31 is downstream of #28 — solve one, resolve the other. Without this, both get triaged independently.

Some valid answers

A monitoring system sends alerts to an on-call engineer. All alerts use identical formatting: subject line + body text. Last week: engineer received 47 alerts in 2 hours, triaged sequentially, and missed a critical database failure buried as alert #31.

Q2. What are the primitives of a triageable alert?

Severity (how urgent), service (where it's from), kind (what response it needs: action vs. awareness), dedup-key (is it repeating), and optionally runbook link (what to do). These are the minimum for routing without body-reading.
The body provides evidence and detail — but the triage decision (do I act on this now?) should be answerable from metadata alone. If you need to read the body to know if it's urgent, the metadata is incomplete.
A well-structured alert lets the engineer ask: "What do I need to do, and in what order?" before reading a single alert body. The answer lives entirely in structured headers.

Some valid answers

A monitoring system sends alerts to an on-call engineer. All alerts use identical formatting: subject line + body text. Last week: engineer received 47 alerts in 2 hours, triaged sequentially, and missed a critical database failure buried as alert #31.

Q3. What's the cost of uniform formatting?

Sequential processing cost: without priority metadata, the engineer must read every alert in sequence to find the critical ones. Alert #31 gets the same attention budget as alert #1, regardless of actual urgency.
Cognitive overload: 47 identically-formatted alerts are indistinguishable at a glance. The engineer's working memory fills with low-priority detail before reaching the critical database failure buried in the middle.
Alert fatigue compounds this: after processing 30 alerts without structure, the engineer starts skimming, and the critical alert in a sea of identical noise gets the same skim treatment as the preceding 30 routine ones.

Recap — 13 Techniques

T2. Seed Decision Trees
T3. Steering
T4. Conceptual Anchoring
T5. Operational Cost Framing
T6. Symmetry Detection
T8. Domain-Neutral Primitives
T9. Layer-Crossing Checks
T10. Escape Hatch Design
T11. Constitutional Layering
T14. Backlog Pressure
T17. Supersession Protocol
T20. Heartbeat Architecture
T21. Structured Communication

Apply when the situation calls for it. A system that chooses otherwise may still be right.

Prompt Engineeringfor Complex Systems

Elective Techniques — One Valid Approach Among Options

What “elective” means here

Sessions

13 Techniques — This Deck

Decision Shaping

Seed Decision Trees

Three components of a seed tree

Methodology: "Start with X → then Y → then Z"

Stopping Conditions

Conditional Routing

Example: Seeded methodology

Exercise

Some valid answers

Some valid answers

Some valid answers

Steering

The constraint spectrum

Example: Soft targets

Exercise

Some valid answers

Some valid answers

Some valid answers

Conceptual Anchoring

Primitives > patterns

Example: Anchoring an audit

Exercise

Some valid answers

Some valid answers

Some valid answers

Evaluation Frameworks

Operational Cost Framing

Three cost dimensions

Example: Cost-based severity

Exercise

Some valid answers

Some valid answers

Some valid answers

Symmetry Detection

One-offs vs. broken symmetry

Example: Symmetry audit

Exercise

Some valid answers

Some valid answers

Some valid answers

Domain-Neutral Primitives

Abstract > specific

Example: Domain-neutral language

Exercise

Some valid answers

Some valid answers

Some valid answers

Audit Techniques

Layer-Crossing Checks

Where integrity violations hide

Boundary: Docs ↔ Code

Boundary: Schema ↔ Usage

Boundary: Claim ↔ Evidence

Example: Cross-layer checklist

Exercise

Some valid answers

Some valid answers

Some valid answers

Escape Hatch Design

Hard constraints + permission to break them

Example: Template with exits

Exercise

Some valid answers

Some valid answers

Some valid answers

Evolutionary Systems

Constitutional Layering

Why flat prompts break

Example: Layered system prompt

Exercise

Some valid answers

Some valid answers

Some valid answers

Supersession Protocol

Why edits are dangerous

Prompt Engineering
for Complex Systems