Week 5 was practical infrastructure: reading and writing files, JSON serialization, pathlib for modern path handling, and JSONL for large datasets.
Sounds dry. It is not. If you test AI systems, you live in JSONL.
What I Built
dataset-loader - a dataset manager with two components:
dataset_loader.py- reads.jsonand.jsonlfiles, provides iteration and filteringschema_validator.py- validates dataset entries against a defined schema
Sample datasets included for both formats.
The Honest Takeaway
JSONL (JSON Lines) is one object per line. No outer array. Each line is independently parseable.
{"id": "001", "input": "What is 2+2?", "expected": "4", "category": "math"}
{"id": "002", "input": "Summarize this in one sentence.", "expected": "...", "category": "summarization"}
Why does this matter for AI testing? Three reasons:
- Streaming. You can read a 100k-record eval dataset line by line without loading it all into memory.
- Debugging. When a test fails, you find the exact line. No nested array traversal.
- Tooling compatibility. OpenAI fine-tuning, RAGAS, DeepEval, Langfuse - they all use JSONL. Learn it once, use it everywhere.
Building the loader in week 5 meant later, when eval frameworks expected this format, I already understood it from the ground up.
What’s Next
Week 6: Error handling and logging. LLM APIs fail. Building code that handles it gracefully.