Mar 04, 2026
Week 9: How LLMs Work, and Why That Changes How We Test Them
Tokens, temperature, training vs inference. The conceptual foundations that change how you write test assertions for AI systems.
Blog
Recent writing and implementation notes.
9 posts
Mar 04, 2026
Tokens, temperature, training vs inference. The conceptual foundations that change how you write test assertions for AI systems.
Feb 25, 2026
pytest advanced: fixtures, scopes, parametrization, conftest.py. The tools that make AI evaluation test suites actually work.
Feb 18, 2026
Learning pytest from scratch and applying it to six weeks of accumulated code. Good test naming is documentation.
Feb 11, 2026
Building a robust API caller with retry logic and exponential backoff. LLM APIs fail more than you expect.
Feb 04, 2026
Reading and writing files, JSON serialization, and the JSONL format that underpins most AI evaluation datasets.
Jan 28, 2026
The week Python scripting became Python engineering. Building a reusable test utilities package from scratch.
Jan 21, 2026
Building a validation library for LLM outputs. Deterministic checks for non-deterministic systems.
Jan 14, 2026
Learning Python data structures by building a test case manager. The list-of-dicts pattern shows up everywhere in AI testing.
Jan 07, 2026
Starting from zero Python to building a reusable prompt template engine in one week. Including one hour lost to PYTHONPATH.
Tokens, temperature, training vs inference. The conceptual foundations that change how you write test assertions for AI systems.
Mar 04, 2026
pytest advanced: fixtures, scopes, parametrization, conftest.py. The tools that make AI evaluation test suites actually work.
Feb 25, 2026
Learning pytest from scratch and applying it to six weeks of accumulated code. Good test naming is documentation.
Feb 18, 2026
Building a robust API caller with retry logic and exponential backoff. LLM APIs fail more than you expect.
Feb 11, 2026
Reading and writing files, JSON serialization, and the JSONL format that underpins most AI evaluation datasets.
Feb 04, 2026
The week Python scripting became Python engineering. Building a reusable test utilities package from scratch.
Jan 28, 2026
Building a validation library for LLM outputs. Deterministic checks for non-deterministic systems.
Jan 21, 2026
Learning Python data structures by building a test case manager. The list-of-dicts pattern shows up everywhere in AI testing.
Jan 14, 2026
Starting from zero Python to building a reusable prompt template engine in one week. Including one hour lost to PYTHONPATH.
Jan 07, 2026