Shelf 04

Architecture patterns

How real AI systems are built. These are the recurring shapes — knowing them gives you a vocabulary to read papers, evaluate vendors, and design your own systems.

Prompt engineering

Writing clear, specific instructions with structure, examples, and a defined output format. The cheapest, highest-leverage skill.

When to reach for it

Start here. Most 'AI doesn't work' problems are prompt problems in disguise.

Be specificShow examplesConstrain the outputIterate

Few-shot prompting

Include 2–5 input/output examples in your prompt to teach the model the pattern you want.

When to reach for it

When a task is hard to describe but easy to demonstrate.

Cover edge casesUse diverse examplesMatch formatting exactly

RAG (Retrieval-Augmented Generation)

Pull relevant chunks from your own data into the prompt at query time so the model can answer with grounded information.

When to reach for it

When answers must come from your docs, not the model's memory.

Chunk smartlyEmbed and searchRerank top resultsCite sources

Tool use / function calling

Give the model a list of tools it can invoke. It decides when to call them and how to use the results.

When to reach for it

When the model needs fresh data, external actions, or precise computation.

Clear tool namesTyped inputsIdempotent actionsHandle errors

Agents

A loop: the model plans, takes an action with a tool, observes the result, and tries again until done.

When to reach for it

Multi-step tasks where the steps aren't known in advance.

Bounded loopsGood tools beat smart promptsLog everythingStop conditions

Multi-agent systems

Specialized agents collaborate — a planner, a coder, a critic — coordinated by an orchestrator.

When to reach for it

Complex workflows where a single agent loses focus. Usually overkill — start simple.

Clear rolesExplicit hand-offsShared scratchpadAvoid chatter

Fine-tuning

Train a model further on your data so it adopts your style, format, or domain knowledge by default.

When to reach for it

Last resort. Try prompting and RAG first. Fine-tune for style, latency, or cost — rarely for knowledge.

Clean datasetQuality > quantityEvaluate against baselineLoRA when possible

Evals

Automated tests for LLM output. Without them, you have no way to know if a prompt change helped or hurt.

When to reach for it

From day two. Before scale, before launch, before optimization.

Golden setRubric scoringLLM-as-judgeTrack over time

Guardrails

Input and output filters that catch unsafe, off-topic, or malformed responses before they reach users.

When to reach for it

Any user-facing deployment.

Input validationOutput schemasTopic checksFail closed

Context engineering

Deliberately managing what's in the model's working memory — what to include, what to summarize, what to drop.

When to reach for it

Long-running agents, big documents, expensive prompts.

CompressionSummariesSub-agents for isolationToken budgets

Prompt caching

Cache long, stable prompt prefixes so repeated calls reuse the work. Big cost and latency wins.

When to reach for it

Whenever a large system prompt or document is reused across calls.

Stable prefixCache key disciplineMeasure hit rate

Streaming

Stream tokens to the user as they're generated. Perceived latency drops dramatically.

When to reach for it

Anywhere a human is waiting for the output.

Server-sent eventsRender incrementallyCancel on user exit

Decision aid

Reach for the simplest thing that works.

If…

Wrong tone, wrong format, wrong reasoning?

Then

Prompt engineering. Try first, always.

If…

Needs information the model doesn't have?

Then

RAG. Index your data, retrieve, inject.

If…

Needs to take actions or fetch live data?

Then

Tool use. Define functions, let the model call them.

If…

Needs to plan multi-step work?

Then

Agents — but bound the loop and log everything.

If…

Want a specific style or format every time?

Then

Few-shot first. Fine-tune only if prompting fails.

If…

Going to production?

Then

Evals + guardrails + streaming. Non-negotiable.