Structured Outputs
Most production LLM features do not need paragraphs -- they need data: a classification label, a list of extracted fields, a tool argument object, or a config patch your code can parse and act on. Structured outputs are the discipline of making that contract explicit, validated, and recoverable when the model drifts. This is reliability engineering, not prompt trivia.
Why structure matters
Free-form text forces you to regex or hope. Structured output lets you:
- Parse deterministically --
JSON.parseor schema validation instead of fragile string extraction. - Compose in pipelines -- downstream code, databases, and agents consume typed data.
- Evaluate objectively -- schema validity and field-level checks are cheap eval scorers.
- Reduce tokens -- a compact object beats a verbose explanation when the consumer is code.
The failure mode you are designing against: the model returns almost-JSON, wraps output in markdown fences, adds a preamble ("Sure! Here is the JSON:"), or hallucinates a field your schema does not allow.
Three layers of structure
| Layer | Mechanism | Who enforces |
|---|---|---|
| Prompt-only | "Return only valid JSON matching …" in the instruction | You parse and retry; weakest |
| Native structured output | Provider API constrains generation to a JSON Schema (OpenAI response_format, Anthropic tool/schema modes, etc.) | Model + API |
| Tool / function calling | Model emits a structured tool call; your code executes | Schema on the tool definition |
Prefer native structured output or tool calling for anything that touches production logic. Prompt-only JSON works for prototypes and low-stakes internal tools.
JSON Schema as the contract
Define the shape you expect explicitly:
{
"type": "object",
"properties": {
"sentiment": { "type": "string", "enum": ["positive", "negative", "neutral"] },
"confidence": { "type": "number", "minimum": 0, "maximum": 1 },
"summary": { "type": "string", "maxLength": 200 }
},
"required": ["sentiment", "confidence"],
"additionalProperties": false
}
Design schemas for machine consumption:
- Use
enumwhere the set of values is fixed. - Set
additionalProperties: falsewhen stray keys indicate hallucination. - Keep required fields minimal -- optional fields reduce failure rate.
- Split large outputs into multiple calls or nested objects rather than one giant schema.
Validation and repair loops
Even with structured modes, validate every response before use:
Repair prompt pattern: append the validation error and the invalid output; ask the model to fix only what failed. Often succeeds in one retry without a full re-generation.
Fallback pattern: after N failures, return a safe default, queue for human review, or degrade the feature (AI in Products).
Cap retries (typically 1–2) to control cost and latency.
Structured output vs tool use
They overlap but serve different roles:
- Structured output -- the answer is data (classification, extraction, generated config).
- Tool use -- the model requests an action; your code runs it and may return structured results back.
Many agent frameworks implement "respond with this JSON schema" as a special tool. For a single-shot extraction task, structured output is simpler; for "search, then format," tool use is natural.
Patterns that work in production
Extraction
Pull fields from unstructured text (invoice lines, support ticket metadata). Schema with required fields + enums; validate types; repair on missing required keys.
Classification and routing
Small schema: { "intent": "...", "confidence": 0.0–1.0 }. Feed intent into model routing.
Agent state
Pass structured state between steps (plan steps, open questions, file paths) instead of prose summaries -- pairs well with context compaction.
Generated UI or config
Model outputs component props or feature-flag patches. Validate against schema and run a sandbox test before applying -- structure does not imply correctness.
Common mistakes
- Huge schemas in one shot -- split into stages (extract entities, then relations).
- No validation -- trusting raw model output in SQL, shell, or payment flows.
- Unbounded strings -- use
maxLengthor post-truncate with explicit handling. - Mixing instructions and data -- system prompt asks for JSON; user message also contains JSON examples that confuse the parser. Separate concerns clearly.
See also
- AI Agents -- tool calling as structured action requests
- Evaluation & LLMOps -- schema validity as an eval scorer
- Cost, Latency & Model Routing -- shorter structured responses save tokens
- AI in Products -- error states when structure fails
- AI Glossary -- structured output and related terms