Which Pattern When?
The AI section covers many patterns. In production you combine them -- rarely "just an LLM" or "just RAG." This page is a capstone decision guide: where to start, what to add next, and what to skip. It assumes you have skimmed Large Language Models and points to deep dives rather than repeating them.
If you want to…
| Goal | Start here | Usually add | Often skip (at first) |
|---|---|---|---|
| Answer questions over your docs | RAG | Embeddings, structured outputs for citations/metadata | Agents until retrieval works |
| Build a support or internal Q&A bot | RAG + AI in Products | Evaluation & LLMOps, privacy | Multi-agent systems |
| Automate multi-step work (search, act, repeat) | Agents | MCP tools, human-in-the-loop | Pure chat without tools |
| Use AI inside your dev workflow | AI-Assisted Development | Project memory & rules, Agent Skills | Custom fine-tuning |
| Ship a feature users see in your product | AI in Products | Structured outputs, cost & latency | Autonomous agents without approval |
| Keep sensitive data on-prem | Cloud vs Local | Local LLM app, privacy | Sending full corpus to frontier APIs |
| Control spend at scale | Cost, Latency & Model Routing | Context engineering, evals | Frontier model for every request |
| Make outputs machine-parseable | Structured Outputs | Validation/repair loops, eval scorers | Free-form prose to regex later |
| Operate safely in production | Safety + Privacy & Data | Human-in-the-loop, red-teaming via evals | Guardrails as the only layer |
| Debug wrong or broken behavior | Debugging LLM Apps | Traces from LLMOps | Rewriting the whole prompt blindly |
Core decision: what kind of problem is it?
Knowledge problem -- model does not know your facts → RAG, knowledge management patterns, or tools that fetch live data.
Action problem -- model must do things, not just text → Agents with narrow tools and HITL on destructive steps.
Format problem -- downstream code needs JSON/fields → Structured outputs, not "please return JSON."
Process problem -- team repeats the same agent ritual → Skills or project memory, not a longer system prompt every time.
RAG vs agent vs chat
| Pattern | Shape | Choose when | Watch out for |
|---|---|---|---|
| Chat | Single model call per turn | Transformation, drafting, classification with small context | Stale knowledge, no actions |
| RAG | Retrieve → generate | Large doc corpus, FAQ, grounding requirements | Bad chunks, injection via retrieved text |
| Agent | Model + tools in a loop | Dynamic plans, APIs, code execution, multi-step research | Cost, latency, runaway loops, tool sprawl |
| Workflow | Fixed steps (your code controls) | Known pipeline, compliance, predictable cost | Less flexible than agents |
Hybrid is normal: RAG inside an agent (retrieve then act), or router that picks chat vs RAG vs agent per request (cost & latency).
Adaptation ladder (cheapest first)
From LLMs -- try in order; stop when quality is good enough:
- Prompt / context -- instructions, examples, context engineering
- RAG -- fresh, private knowledge at inference
- Project memory / rules / skills -- repeatable team conventions and workflows
- Fine-tuning / LoRA -- domain style or format the model resists via prompting (see RAG vs fine-tuning)
- Pre-training -- almost never
For coding agents, (3) often beats (4).
Configuration stack for coding agents
| Need | Use |
|---|---|
| Repo orientation, build commands | AGENTS.md / project memory |
| File-type conventions | Cursor rules |
| Multi-step rituals (release, review) | Agent skills |
| External systems (Jira, DB) | MCP servers |
Do not duplicate the same checklist in memory, rules, and skills -- one source of truth per concern.
Production checklist (any pattern)
Before launch, you should have answers for:
- Eval or test set -- representative inputs + pass criteria (Evaluation & LLMOps)
- Cost/latency budget -- model tier, max tokens, max agent rounds (Cost & Latency)
- Failure UX -- API down, refusal, wrong answer (AI in Products)
- Data handling -- what leaves your network (Privacy & Data)
- Safety -- injection surface, tool permissions (Safety)
- Runbook -- how to diagnose incidents (Debugging LLM Apps)
Common anti-patterns
- Agent first -- jumping to agents before retrieval and prompts work
- Frontier everywhere -- no routing; invoice surprises (Cost & Latency)
- Guardrails only -- no HITL on money/deletes/publish (Human-in-the-Loop)
- Eval never -- shipping on vibes (Evaluation & LLMOps)
- Giant system prompt -- everything in one blob instead of skills, RAG, and tools (Context Engineering)
Suggested reading order
New to LLMs: LLM → Context Engineering → this page → your goal row in the table above.
Building a product feature: AI in Products → Structured Outputs → RAG or Agents → Eval & LLMOps.
Using AI as an engineer: AI-Assisted Development → Project Memory & Rules → Agent Skills.
Operating in production: Evaluation & LLMOps → Debugging LLM Apps → Cost & Latency.
See also
- Debugging LLM Apps -- when something breaks in prod
- Knowledge Management with LLMs -- RAG vs wiki vs just-in-time
- Tooling and Frameworks -- frameworks and observability
- AI Glossary -- terms across all patterns