Skip to main content

AI in Products

Building with LLMs as an engineer is one skill; shipping LLM features to users is another. Users do not care about RAG or agents -- they care whether the feature is fast, trustworthy, and worth the occasional wrong answer. This page covers product and UX decisions for AI-powered features, without diving into model internals.

Interaction models

PatternUser experienceBest whenRisks
ChatOpen-ended dialogueExploration, support, copilotScope creep, long threads, unclear limits
Inline / copilotAI inside an existing workflow (editor, form, dashboard)Task-specific assist where context is already on screenInterrupting flow; unclear what AI can see
Background automationAI runs without real-time interaction (summaries, tagging, routing)Batch work, notifications, prep for human reviewSilent failures; users do not know AI was involved
Generative fillOne-shot: "draft this" or "complete this field"Templates, emails, descriptionsOver-reliance; users stop editing
Agent with approvalAI proposes actions; user confirms before executionDestructive or high-stakes opsFriction if overused -- see Human-in-the-Loop

Most successful products combine patterns: inline suggestions plus chat for follow-up, or background classification plus a human queue for exceptions.

Streaming and perceived performance

LLM latency is often seconds. Streaming (showing tokens as they arrive) improves perceived speed even when total time is unchanged.

Product guidelines:

  • Stream text the user will read (answers, drafts). Do not stream internal chain-of-thought unless you intend to expose reasoning.
  • Show progress for multi-step agents ("Searching…", "Running tests…") so silence is not mistaken for a hang.
  • Allow cancel -- long runs need a stop button; partial results should be usable or clearly discarded.
  • Set expectations -- "This usually takes 10–20 seconds" beats a blank spinner.

For background jobs, prefer email or in-app notification over blocking the UI.

Trust, transparency, and control

Users trust AI features more when they understand boundaries:

  • Disclose AI involvement where it affects decisions (support replies, content moderation, recommendations).
  • Show sources when RAG or search grounds the answer -- citations beat "the AI said so."
  • Make undo easy -- especially for generative edits; treat AI output as a draft, not a commit.
  • Offer escape hatches -- "Talk to a human," "Use manual mode," "Turn off AI for this project."

Regulatory and contractual context for data sent to models is covered under Privacy & Data Handling.

When not to use AI

AI is the wrong default when:

SituationBetter approach
Deterministic correctness requiredRules, validators, traditional code (tax calc, ACL checks)
Latency budget under ~200msCached lookups, heuristics, small local models only if proven fast enough
Rare edge cases dominateHuman process or explicit rule tables; LLMs miss long-tail cases
Liability without reviewHuman approval or non-AI fallback for legal/medical/financial claims
Cost exceeds user valueSimpler UX without AI; users will not pay (directly or indirectly) for the feature
Training data is stale and no retrievalFix data pipeline or add RAG before adding a model

A useful product question: "If the model were wrong 5% of the time, would this feature still be valuable with recovery paths?" If no, do not ship it as pure AI.

Error states and failure UX

Models fail openly and silently:

  • API errors -- rate limits, timeouts, provider outages. Show a clear message and retry; do not infinite-spin.
  • Refusals -- policy blocks. Explain briefly and offer an alternative path.
  • Low-quality output -- wrong but fluent. Confidence UI is hard; prefer validation + structured outputs for machine-checked fields.
  • Partial agent failure -- one tool fails mid-loop. Surface what succeeded and what did not; do not pretend completion.

Design graceful degradation: if AI is unavailable, core product still works (manual mode, cached answer, queued for later). Feature flags let you disable AI per tenant or region without redeploying.

Measuring product success

Engineering evals ask "is the model good?" Product metrics ask "is the feature good?"

Track both:

  • Task success rate -- did the user accomplish their goal?
  • Edit distance -- how much users change AI drafts before accepting
  • Override / thumbs-down rate -- explicit dissatisfaction signals
  • Time to complete -- AI should reduce work, not add review burden
  • Support tickets -- spikes after launching an AI feature are a red flag

Run qualitative sessions early; users often want less AI surface area than engineers assume.

Accessibility and inclusion

  • Do not rely on color alone for AI-generated status (errors, confidence).
  • Ensure streamed content works with screen readers (live regions, sensible updates).
  • Provide non-AI paths for users who opt out or use assistive workflows that conflict with chat UIs.

See also