Evaluation & LLMOps
How to test non-deterministic LLM systems with datasets, scorers, and LLM-as-judge; eval-driven development and harness engineering; and the LLMOps discipline of operating prompts, models, and agents in production.
How to test non-deterministic LLM systems with datasets, scorers, and LLM-as-judge; eval-driven development and harness engineering; and the LLMOps discipline of operating prompts, models, and agents in production.