// ACHIEVEMENTS.LOG / MAY 21, 2026
◈ TECHNICAL

Load Testing Framework

12-cell matrix harness measures throughput across size, concurrency, and history dimensions.

Before optimising performance you need to measure it. We built a purpose-designed load testing harness that stress-tests the inference stack across three independent dimensions simultaneously.

Test matrix

DimensionValuesPurpose
Request sizesmall (128 tok), medium (384 tok), heavy (1024 tok)Model the full token range
Concurrencyserial, parallel (10 simultaneous)Stress the inference queue
Contextno history, with historyMeasure prompt-length degradation

12 cells × 10 requests each = 120 total requests per run. The harness uses the Qwen tokenizer for accurate token budgeting and records wall time, tokens/second, HTTP status, and full usage breakdown per request.

Key finding

Peak throughput: 33.59 tok/s (small, serial, no history). The Semaphore(1) serialisation in the shim caused a 10× collapse under parallel load — the primary driver of the Ollama migration.