thingsithinkithink

Artificial intelligence

View All
Claude Agent SDK Part 5: Editing Files with Checkpointing

Claude Agent SDK Part 5: Editing Files with Checkpointing

Adding the ability for the agent to create posts that follow my templates, with the ability to recover from mistakes.

Claude Agent SDK Part 4: Implementing Context Profiles

Claude Agent SDK Part 4: Implementing Context Profiles

Building context profiles and usage tracking that works with the SDK's design.

Claude Agent SDK Part 3: The Context Control Problem

Claude Agent SDK Part 3: The Context Control Problem

Discovering the trade-offs between agency and control when building on the Claude Agent SDK.


Recent Post

Three Things I Did Over Christmas

Three Things I Did Over Christmas

One of the nice things about time off is the chance to play a little.

LLM Evals Course: Complete Course Recap

LLM Evals Course: Complete Course Recap

This is my recap of Hamel and Shreya's LLM evaluation course. I'm hoping I come back here in the future every time I need to remind myself of how to do this the right way.

LLM Evals Lesson 8: Improving LLM Products

LLM Evals Lesson 8: Improving LLM Products

Notes from the final lesson of Hamel and Shreya's LLM evaluation course - practical strategies for improving accuracy and reducing costs through prompt refinement, architecture changes, fine-tuning, and model cascades.

LLM Evals Course Lesson 7: Interfaces for Human Review

LLM Evals Course Lesson 7: Interfaces for Human Review

Notes from lesson 7 of Hamel and Shreya's LLM evaluation course - interface design principles and strategic sampling.

Building an AI Sandbox with Docker

Building an AI Sandbox with Docker

How to set up a persistent Docker environment for AI coding tools without losing your authentication every time you restart the container.

LLM Evals Course Lesson 6: Complex Pipelines and CI/CD

LLM Evals Course Lesson 6: Complex Pipelines and CI/CD

Notes from lesson 6 of Hamel and Shreya's LLM evaluation course - debugging agentic systems, handling complex data modalities, and implementing CI/CD for production LLM applications.