RAG Design Canvas¶

Use this canvas to design and document the architecture of a Retrieval-Augmented Generation (RAG) system. Complete it together with the Tech Lead, Data Scientist and Context Builder.

When to complete this canvas?

Required when the AI system gains access to more than one knowledge source (documents, databases, APIs). See also the Context Builder role and AI Architecture.

Download this template

Download as Markdown — Open in your editor or AI assistant and fill in the fields.

A. Use Case & Trigger¶

Field	Fill in
User question	What does the end user typically ask?
Trigger	When is RAG activated? (always / on low confidence / on specific keywords)
What may the model NOT do?	Hard Boundaries for the retrieval path (e.g. never give medical advice)
Expected response format	Text / Table / JSON / Cited answer with sources

B. Document Inventory¶

Knowledge source	File format	Volume (estimated)	Update frequency	Owner
[E.g. Product catalogue]	PDF / DOCX / CSV	[number of documents / MB]	Daily / Weekly / Static	[name]

Context pollution risk: Is there a risk that irrelevant sources degrade model responses? ☐ Yes → see Section G · ☐ No

C. Chunking Strategy¶

Parameter	Choice	Motivation
Split method	☐ Fixed size · ☐ Section-based · ☐ Paragraph · ☐ Semantic
Chunk size (tokens)	[e.g. 512 tokens]
Overlap (tokens)	[e.g. 64 tokens]
Metadata per chunk	☐ Source title · ☐ Page number · ☐ Date · ☐ Author

Guideline

Use section-based chunking for structured documents (reports, manuals). Use fixed size + overlap for continuous text. Larger chunks provide more context but higher cost per retrieval.

D. Embedding Model¶

Parameter	Choice
Model	[e.g. text-embedding-3-small (OpenAI) / embed-multilingual-v3 (Cohere)]
Dimensions	[e.g. 1536]
Provider	[e.g. OpenAI / Cohere / Hugging Face / local]
Multilingual?	☐ Yes (NL + EN) · ☐ No
Cost per 1M tokens	[e.g. €0.02]

E. Vector Store¶

Parameter	Choice
Technology	☐ Pinecone · ☐ Weaviate · ☐ pgvector · ☐ Chroma · ☐ Qdrant · ☐ Other: ____
Hosting model	☐ Cloud (managed) · ☐ Self-hosted · ☐ In-memory (dev/test)
Indexing strategy	☐ Flat · ☐ HNSW · ☐ IVF
Estimated vector count	[e.g. 50,000 chunks]
Backup & recovery	☐ Daily · ☐ Weekly · ☐ N/A

F. Retriever Parameters¶

Parameter	Value	Motivation
Top-K	[e.g. 5]	How many chunks are passed to the LLM?
Similarity threshold	[e.g. ≥ 0.75]	Minimum cosine similarity for inclusion in context
Re-ranking?	☐ Yes (model: ____) · ☐ No	Cross-encoder re-ranking increases precision
Hybrid search?	☐ Yes (keyword + vector) · ☐ No
Max context (tokens)	[e.g. 4096]	Total context limit for retrieval output

G. Context Quality & CDL¶

The Context Builder manages the Context Development Lifecycle (CDL): which information is current, what is outdated?

Check	Status
Is there a process for removing outdated documents?	☐ Yes · ☐ No → action required
Are irrelevant chunks filtered before LLM call?	☐ Yes · ☐ No
Has the maximum context size been determined (context pollution prevention)?	☐ Yes · ☐ No
Are source citations included in the response?	☐ Yes · ☐ No
Has the Context Builder role been assigned?	☐ Yes, name: ____ · ☐ No — automated

H. Quality Metrics¶

Metric	Definition	Target	Measurement
Precision@K	% relevant chunks in top-K results	≥ 80%	Offline evaluation on Golden Set
Recall@K	% relevant chunks retrieved	≥ 70%	Offline evaluation on Golden Set
Faithfulness	Answer based on retrieved context (no hallucination)	≥ 90%	RAGAS or manual review
Answer Relevance	Answer relevant to the question asked	≥ 85%	RAGAS or manual review
Latency (p95) (95th percentile — 95% of all requests are faster than this value)	Retrieval + generation time	\< 3 seconds	Production monitoring

I. Cost Estimate¶

Cost item	Unit	Estimated volume/month
Embedding (initial)	per 1M tokens	[one-time]
Embedding (updates)	per 1M tokens/month
Vector store storage	per GB/month
LLM inference (retrieval)	per 1M tokens
Total (month)

See also: Cost Optimisation and GAINS™ framework for ROI linkage.

J. Approval¶

Role	Name	Date	Signature
Tech Lead
Data Scientist
Context Builder
Guardian

Related modules:

Was this page helpful? Give feedback