9router: Route AI Coding Agents Through 40+ Providers with 3-Tier Fallback

By Prahlad Menon Published 2026-05-22 5 min read

If you’re running AI coding agents seriously, you’ve hit the economics wall. Claude Code burns through API credits. Cursor Pro has usage caps. Codex bills per token. The per-session cost adds up fast when you’re iterating on real codebases — especially once context rot forces expensive re-prompts.

9router is a local routing proxy that sits between your coding agent and the LLM providers, routing each request through a 3-tier fallback chain: Subscription → Cheap → Free. It has 13.4K GitHub stars and has been trending for good reason — it makes multi-provider routing dead simple.

How It Works

Install globally and start routing:

npm install -g 9router
9router start

The dashboard opens at localhost:20128. Point your coding tool at 9router’s local endpoint instead of directly at OpenAI/Anthropic/Google, and it handles the rest.

The 3-tier fallback logic is straightforward:

Subscription tier — Your paid API keys (Anthropic, OpenAI, etc.). Used first when available and within budget limits.
Cheap tier — Lower-cost providers like DeepSeek, Groq, Together AI. Falls back here when subscription quotas are exhausted.
Free tier — Providers with free access like Kiro AI, OpenCode Free, and Vertex AI (which gives $300 in credits). The safety net that keeps your agent running even when you’ve burned through paid credits.

Each tier supports multi-account round-robin, so you can load-balance across multiple API keys per provider. If one key hits a rate limit, the next one picks up seamlessly.

Format Translation

This is where 9router gets genuinely useful beyond simple proxying. It translates between OpenAI, Claude, and Gemini API formats on the fly. Your tool speaks OpenAI format? 9router can route that request to Claude’s API or Gemini’s API without any configuration on the tool side.

This means you can use Claude Code’s interface but route certain requests through cheaper Gemini models, or use Cursor with providers it doesn’t natively support. The format translation covers 100+ models across 40+ providers.

RTK Token Saver

9router includes a built-in token compression feature called RTK (not to be confused with rtk the CLI proxy — though they’re complementary). 9router’s RTK specifically targets tool_result content in the conversation, compressing it by 20-40% before sending to the LLM.

The approach is different from standalone rtk, which intercepts shell commands at the OS level and compresses their output before it enters the agent’s context. 9router’s token saver operates at the API request level, compressing tool results that are already in the conversation history. You could actually run both — rtk compresses output at the shell, and 9router compresses what’s left at the API layer. Double compression for the token-conscious. If you’re also looking at MCP vs skills for token efficiency, the savings stack.

The Dashboard

The web dashboard at localhost:20128 gives you real-time visibility into:

Request routing — which provider handled each request and why
Token usage — per-provider, per-model breakdowns
Cost tracking — estimated spend across all tiers
Fallback events — when and why requests fell through to cheaper tiers
Account status — rate limits, quotas, and health per API key

This is the kind of observability that’s painful to build yourself. When you’re juggling multiple providers and API keys, knowing where your tokens are going matters.

Free Provider Highlights

The free tier is surprisingly capable:

Kiro AI — Amazon’s coding agent, accessible through 9router’s routing
OpenCode Free — Free tier access to coding-optimized models
Vertex AI — Google Cloud gives $300 in free credits, which goes a long way with Gemini models

For side projects or experimentation, you can run AI coding agents at literally zero cost by configuring only the free tier.

When to Use This

9router makes sense if you:

Run multiple AI coding tools and want unified provider management
Want automatic fallback when one provider is down or rate-limited
Have multiple API keys and want round-robin load balancing
Need to stretch a budget by mixing paid and free providers
Want visibility into where your AI spend is actually going

It’s a routing and orchestration layer. It doesn’t change how your coding agent works — it changes how requests reach the LLM. That’s a clean separation of concerns.

Getting Started

npm install -g 9router
9router start
# Dashboard: http://localhost:20128
# Configure providers, add API keys, set fallback tiers
# Point your coding tool at 9router's local endpoint

The GitHub repo has detailed setup guides for Claude Code, Cursor, Codex, and Cline. Configuration is YAML-based and the defaults are sensible.

At 13.4K stars and actively maintained, 9router is one of the more mature tools in the AI coding infrastructure space. If you’re spending real money on AI coding and want more control over where those tokens go, it’s worth the 5-minute setup.

rtk: A Rust CLI Proxy That Cuts AI Agent Token Usage 60-90% — Complementary shell-level compression
Skills vs MCP: Token Efficiency for AI Agents — Another angle on reducing token waste
Context Rot in AI Coding Agents — Why long sessions get expensive
Personal AI Agents: OpenRouter Rankings 2026 — Comparing provider options
Karpathy’s Four Rules for AI Coding Agents — Best practices for agent configuration