Full-stack cloud + on-prem infrastructure built from scratch in one sprint.
In a single sprint we provisioned the complete Intellixer production stack: Google Cloud infrastructure via Terraform (GCE VM, Cloud SQL Postgres, Secret Manager, GCS backups, KMS encryption), a LiteLLM API gateway with Postgres-backed usage tracking, Presidio-based PII anonymisation pipeline, and the on-prem Mac Mini M4 inference node running MLX via a custom FastAPI shim behind Caddy TLS.
Client → api.intellixer.farm (GCE, LiteLLM) → dc1.webhop.me (Caddy/Mac M4) → mlx_lm
Mac Mini M4 (16 GB Unified Memory) runs quantised 3B-parameter models at 30+ tokens/second on the Apple Neural Engine — at a fraction of GPU cloud cost.