Which Pattern When?

The AI section covers many patterns. In production you combine them - rarely "just an LLM" or "just RAG." This page is a capstone decision guide: where to start, what to add next, and what to skip. It assumes you have skimmed Large Language Models and points to deep dives rather than repeating them.

If you want to…

Goal	Start here	Usually add	Often skip (at first)
Answer questions over your docs	RAG	Embeddings, structured outputs for citations/metadata	Agents until retrieval works
Build a support or internal Q&A bot	RAG + AI in Products	Evaluation & LLMOps, privacy	Multi-agent systems
Automate multi-step work (search, act, repeat)	Agents	MCP tools, human-in-the-loop	Pure chat without tools
Use AI inside your dev workflow	AI-Assisted Development	Project memory & rules, Agent Skills	Custom fine-tuning
Ship a feature users see in your product	AI in Products	Structured outputs, cost & latency	Autonomous agents without approval
Keep sensitive data on-prem	Cloud vs Local	Local LLM app, privacy	Sending full corpus to frontier APIs
Control spend at scale	Cost, Latency & Model Routing	Context engineering, evals	Frontier model for every request
Make outputs machine-parseable	Structured Outputs	Validation/repair loops, eval scorers	Free-form prose to regex later
Operate safely in production	Safety + Privacy & Data	Human-in-the-loop, red-teaming via evals	Guardrails as the only layer
Debug wrong or broken behavior	Debugging LLM Apps	Traces from LLMOps	Rewriting the whole prompt blindly

Core decision: what kind of problem is it?

Knowledge problem - model does not know your facts → RAG, knowledge management patterns, or tools that fetch live data.

Action problem - model must do things, not just text → Agents with narrow tools and HITL on destructive steps.

Format problem - downstream code needs JSON/fields → Structured outputs, not "please return JSON."

Process problem - team repeats the same agent ritual → Skills or project memory, not a longer system prompt every time.

RAG vs agent vs chat

Pattern	Shape	Choose when	Watch out for
Chat	Single model call per turn	Transformation, drafting, classification with small context	Stale knowledge, no actions
RAG	Retrieve → generate	Large doc corpus, FAQ, grounding requirements	Bad chunks, injection via retrieved text
Agent	Model + tools in a loop	Dynamic plans, APIs, code execution, multi-step research	Cost, latency, runaway loops, tool sprawl
Workflow	Fixed steps (your code controls)	Known pipeline, compliance, predictable cost	Less flexible than agents

Hybrid is normal: RAG inside an agent (retrieve then act), or router that picks chat vs RAG vs agent per request (cost & latency).

Adaptation ladder (cheapest first)

From LLMs - try in order; stop when quality is good enough:

Prompt / context - instructions, examples, context engineering
RAG - fresh, private knowledge at inference
Project memory / rules / skills - repeatable team conventions and workflows
Fine-tuning / LoRA - domain style or format the model resists via prompting (see RAG vs fine-tuning)
Pre-training - almost never

For coding agents, (3) often beats (4).

Configuration stack for coding agents

Need	Use
Repo orientation, build commands	`AGENTS.md` / project memory
File-type conventions	Cursor rules
Multi-step rituals (release, review)	Agent skills
External systems (Jira, DB)	MCP servers

Do not duplicate the same checklist in memory, rules, and skills - one source of truth per concern.

Production checklist (any pattern)

Before launch, you should have answers for:

Eval or test set - representative inputs + pass criteria (Evaluation & LLMOps)
Cost/latency budget - model tier, max tokens, max agent rounds (Cost & Latency)
Failure UX - API down, refusal, wrong answer (AI in Products)
Data handling - what leaves your network (Privacy & Data)
Safety - injection surface, tool permissions (Safety)
Runbook - how to diagnose incidents (Debugging LLM Apps)

Common anti-patterns

Agent first - jumping to agents before retrieval and prompts work
Frontier everywhere - no routing; invoice surprises (Cost & Latency)
Guardrails only - no HITL on money/deletes/publish (Human-in-the-Loop)
Eval never - shipping on vibes (Evaluation & LLMOps)
Giant system prompt - everything in one blob instead of skills, RAG, and tools (Context Engineering)

Which Pattern When?

If you want to…

Core decision: what kind of problem is it?

RAG vs agent vs chat

Adaptation ladder (cheapest first)

Configuration stack for coding agents

Production checklist (any pattern)

Common anti-patterns

Suggested reading order

See also

If you want to…​

Core decision: what kind of problem is it?​

RAG vs agent vs chat​

Adaptation ladder (cheapest first)​

Configuration stack for coding agents​

Production checklist (any pattern)​

Common anti-patterns​

Suggested reading order​

See also​

If you want to…

Core decision: what kind of problem is it?

RAG vs agent vs chat

Adaptation ladder (cheapest first)

Configuration stack for coding agents

Production checklist (any pattern)

Common anti-patterns

Suggested reading order

See also