1. Template 09.06: Validation Report (Evidence Package)¶

When to use this?

You are completing a validation pilot and need to compile the evidence package for the Gate Review decision (Go / Go with actions / No-Go).

Download this template

Download as Markdown — Open in your editor or AI assistant and fill in the fields.

1. Summary (1 page)¶

Project: [Name] Risk Level: [Minimal / Limited / High] Collaboration Mode: [1–5] Release/Build: [e.g. RC-1] Test period: [YYYY-MM-DD to YYYY-MM-DD]

Conclusion (choose one)¶

Conclusion (choose one)

Go — meets Evidence Standards norms for this risk level
Go with actions — only after completing actions under §7
No-Go — does not meet; redesign/retrain/reformulate required

Top 3 findings:

...
...
...

2. Scope & references (traceability)¶

Objective Card version: [link/ID] Hard Boundaries version: [link/ID] System Prompts version: [link/ID] Model Card version: [link/ID] Test protocol version (Golden Set Test): [link/ID] Risk Pre-Scan: [link/ID]

3. Test Setup¶

Environment: [Dev/Test/Prod-simulation]
Model settings: [e.g. temperature, max tokens]
Knowledge Coupling: [Yes/No] — if yes: which source set + update frequency
Preconditions: [e.g. rate limits, timeouts, tooling]

4. Test Sets (Golden Set + supplements)¶

Golden Set¶

Number of cases: [minimum according to Evidence Standards]
Origin: [tickets, emails, calls, forms...]
Coverage: [80/15/5 or 70/20/10 depending on risk level]

Adversarial set (required for Limited/High)¶

Number of adversarial prompts: [#]
Types: jailbreak / prompt injection / data leak / source fabrication

Fairness set (required for High)¶

Approach: [quantitative / qualitative + motivation]
Groups/segments: [describe without sensitive details]

5. Results vs Evidence Standards¶

Criterion	Norm	Measured	Pass/Fail
Critical errors	0	[#]	[ ] Pass [ ] Fail
Major errors (max)	[#]	[#]	[ ] Pass [ ] Fail
Factuality	[≥..%]	[..%]	[ ] Pass [ ] Fail
Relevance (1–5)	[≥..]	[..]	[ ] Pass [ ] Fail
Safety (refusal)	100%	[..%]	[ ] Pass [ ] Fail
Transparency (if applicable)	100%	[..%]	[ ] Pass [ ] Fail
Fairness (bias)	[≤..%]	[..%]	[ ] Pass [ ] Fail
Audit trail	per standard	[..]	[ ] Pass [ ] Fail

6. Error Overview (mandatory)¶

Critical errors (0 permitted)¶

Case-ID	Description	Impact	Cause	Fix	Status

Major errors¶

Case-ID	Description	Impact	Cause	Fix	Status

Recurring patterns (failure modes)¶

[e.g. source attribution incorrect for document type X]
[e.g. overly creative tone on short prompts]

7. Logging & Audit Trail (evidence that we can trace back)¶

What we log: [according to Evidence Standards §7]
Where it is stored: [tool + location]
Retention: [90 days / 12 months / other]
Privacy measures: [hashing/pseudonymisation/redaction]

8. Action Plan (complete only if "Go with actions" or "No-Go")¶

Action	Owner	Deadline	Expected effect	Verification (test)

9. Go/No-Go Sign-off¶

Tech Lead: ____________________ AI Product Manager: _________ Guardian: ____________________

Was this page helpful? Give feedback