If you’re using an AI coding assistant in 2025, tokens are your currency. Every message, every tool call, every line of terminal output injected into the context window costs you. And unlike human readers, LLMs don’t skim — they read everything.
So when I realized my test suite was silently burning hundreds of tokens per run, I fixed it in an afternoon.
The Setup: AI Needs Guardrails
LLMs are non-deterministic. The same prompt doesn’t always produce the same code. That’s why, when working with an AI pair programmer, you need constant verification:
- TypeScript — catches type errors the AI introduced
- Linting — enforces style and catches obvious mistakes
- Tests — confirms the behavior is actually correct
You end up running these checks constantly. Every time the AI makes a change, you verify. That verification output goes back into the LLM context — which means it costs tokens.
The Problem: 400 Tests, All Passing, All Useless
Our project recently crossed 400 tests. When everything passes, the default Vitest output looks like this:
✓ src/routers/__tests__/auth.test.ts (12 tests) 523ms ✓ src/routers/__tests__/games.test.ts (34 tests) 1204ms ✓ src/routers/__tests__/clubs.test.ts (8 tests) 312ms ✓ src/routers/__tests__/badges.test.ts (9 tests) 287ms ✓ src/routers/__tests__/connections.test.ts (21 tests) 891ms ✓ src/routers/__tests__/notifications.test.ts (7 tests) 198ms ✓ src/routers/__tests__/user.test.ts (15 tests) 445ms ... 27 more files
Test Files 34 passed (34) Tests 477 passed (477) Duration 18.43sEvery single line is noise. The AI doesn’t need to know which file took 523ms. It doesn’t need a ✓ for each suite. It needs one thing: did anything break?
When the answer is no, that’s 40+ lines of output injected into context for zero informational gain.
Vitest has a custom reporter API. You implement a simple interface and control exactly what gets printed. The contract I wanted:
- All pass → one line:
PASS (477) FAIL (0) - Any fail → summary line + only the failing tests with their error
Signal only, no noise.
The Implementation
I packaged this up and published it to npm as vitest-ai-reporter. Install it with:
npm install -D vitest-ai-reporter# orpnpm add -D vitest-ai-reporterThen wire it into your vitest.config.ts:
import { defineConfig } from 'vitest/config'import AIReporter from 'vitest-ai-reporter'
export default defineConfig({ test: { reporters: [new AIReporter()], },})The whole thing is 92 lines of TypeScript if you want to see exactly what’s happening under the hood.
Before vs After
Before (all passing, ~40 lines):
✓ src/routers/__tests__/auth.test.ts (12 tests) 523ms ✓ src/routers/__tests__/games.test.ts (34 tests) 1204ms ... 32 more files
Test Files 34 passed (34) Tests 477 passed (477) Duration 18.43sAfter (all passing, 1 line):
PASS (477) FAIL (0)After (with failures):
PASS (473) FAIL (4)
FAIL src/routers/__tests__/auth.test.ts:45:19 login > invalid credentials expected 400 to equal 200 45| expect(res.status).toBe(200) ^
FAIL src/routers/__tests__/user.test.ts:112:23 getFriends > returns empty list expected [] to deeply equal null 112| expect(result).toEqual(null) ^The AI gets exactly what it needs — no more, no less.
On a project with 34 test files and 477 tests, the default output is ~40 lines (~300 tokens). The compressed output is 1 line (~10 tokens). That’s a ~97% token reduction per test run. Run tests 20 times in a session and you’ve saved ~5,800 tokens — before accounting for failures, which add back only the signal you actually need.
The Bigger Principle
This is a small trick, but it points at something bigger: every tool that feeds output into an AI session should be shaped for AI consumption.
Default CLI tools were designed for humans who skim. AI reads everything. A passing test suite is a green light — one line is enough. A failing suite needs a precise error report — not a wall of stack traces, just the test name and what went wrong.
Tokens compound. Every pnpm test in a session, every CI log, every linter output. Start treating your tooling output as part of your prompt engineering.
For broader CLI output compression across git, docker, and more, RTK is worth a look.