LLM Evals Course Lesson 6: Complex Pipelines and CI/CD
Notes from lesson 6 of Hamel and Shreya's LLM evaluation course - debugging agentic systems, handling complex data modalities, and implementing CI/CD for production LLM applications.
Notes from lesson 6 of Hamel and Shreya's LLM evaluation course - debugging agentic systems, handling complex data modalities, and implementing CI/CD for production LLM applications.
Notes from lesson 5 of Hamel and Shreya's LLM evaluation course - evaluating retrieval quality, generation quality, and common pitfalls in RAG systems.
Swyx argues for 2025-2035 as the decade of AI agents, backed by unprecedented infrastructure investment and converging technical definitions.
Notes from the first lesson of Parlance Lab's Maven course on evaluating LLM applications - covering the Three Gulfs model and why eval is where most people get stuck.
Trying to blend together two AI Framework styling into one that's more practically useful
I like bits of Brunig's and Mollick's AI frameworks, but neither quite works for me.
A systematic approach to analysing and improving large language model applications through error analysis.
Why evaluation-driven experimentation creates better roadmaps in AI products.
Understanding the combinatorial complexity problem that plagues many software systems, and how modern architectures solve it.