Smart AI Router + Token Plans

Cut your AI spend
in half.

Building apps. Running multi-agent swarms. Writing content. Bringing characters to life. Whatever you run on AI, Ozore's Smart AI Router + Discounted Token Plans cut what it costs.

More control than a coding plan. None of the hassle of managing raw tokens. Real savings on both.

One key. Drop-in compatible with Claude Code, OpenClaw, Hermes, Cursor, and any tool that uses the standard chat completions API.

Unlimited tokens on standard models· 20+ models, one key· No contracts· Cancel anytime
Core Feature

Ozore's Smart AI Router

Every request is classified in real time: complexity, category, context length, agentic depth. The router picks the cheapest model that can handle the job well. Simple question? Free standard model. Complex architecture review? Premium model, charged from your token pool. You never configure anything.

Auto Classification

Coding, writing, analysis, agentic orchestration: the router detects what you're doing and picks the right model tier. Session affinity preserves KV cache for multi-turn conversations.

📜

Compaction Engine

Long agentic sessions accumulate bloat: duplicate tool results, stale code blocks, base64 blobs. Ozore's compaction engine strips it automatically before each request, cutting input tokens without losing any meaningful context. Fewer tokens in = lower cost per call.

🔗

Multi-Tier Fallback

If the primary model is down, the router falls through a ranked chain of alternatives. Your request always gets a response. You never see a provider outage.

Frustration Escalation

If the router detects repeated failures or user frustration, it automatically escalates to a more capable model. No manual intervention needed.

You control the spend

Ozore's Smart AI Router has three modes. Switch anytime from your dashboard.

💰
Save

Maximize your token pool. The router sticks to standard models whenever possible and only escalates to premium when strictly necessary.

Balanced

The default. Ozore picks the best model for each request, using premium tokens when quality demands it.

Premium

Spend freely for maximum quality. The router prefers premium models even for tasks that standard models could handle.

How token plans work

Two tiers of models. One simple system.

Standard Models

Always free. Always unlimited.

MiniMax M3, Mimo V2.5 Pro, Gemini 3.1 Flash, Deepseek V4 and more. These handle the vast majority of coding, writing, and general tasks.

No token cost. No limits. Use them all day, every day.
Premium Models

Frontier power, metered by tokens.

GPT-5.4, GPT-5.5, Sonnet 4.6, Opus 4.7, Fable 5, Gemini 3.5 Flash, Grok Build, and more. Used when you need the absolute best quality.

Your monthly pool resets every billing cycle.

Why it costs so much less

Most agent requests don't need a frontier model. Ozore's Smart AI Router knows the difference, so you stop paying premium rates for work that standard models do just as well.

The old way

Everything runs on a frontier model

Your agent fires off hundreds of calls a session: file reads, small edits, lookups, planning. Run them all on Opus or Sonnet and you pay top-tier rates for every single one, even the trivial ones.

You're billed premium for 100% of requests.
The Ozore way

The router uses premium models, only when it counts

The same session, classified request-by-request. Routine calls go to unlimited free models. The hard ones (architecture, tricky debugging, final review) get a premium model from your token pool. And the compaction engine trims bloat from every request, so even premium calls use fewer tokens.

Smarter routing + smaller requests = real savings.
Compaction Engine

Smaller requests, same context

Long agentic sessions balloon with duplicate file reads, stale code blocks, base64 blobs, and tool output noise. Ozore's compaction engine strips all of it automatically before the request hits the model, keeping recent context and summaries intact. You get the same quality answers with significantly fewer input tokens billed.

Duplicate tool results
Collapsed to a single reference
Old code blocks
Older drafts summarized, latest version always preserved
Base64 / image blobs
Stripped from older turns, recent kept
Heartbeats & noise
Stripped from requests when not needed

Your exact savings depend on your workload and which router mode you pick. The more routine work in your sessions, the more you save. Standard models are always free, with unlimited tokens.

Pro & Max

Custom Router

Want full control? Build your own router on Ozore. Choose 3+ models from any provider and the router only uses your picks, with the same fallback chain, compaction engine, and session affinity behind it. Perfect for compliance, latency control, or personal preference.

Pick Your Models

Choose from 20+ models across OpenAI, Anthropic, Google, xAI, DeepSeek, and more. Mix standard and premium freely.

Smart Fallbacks

If your first-choice model is down, the router tries your other picks in order. No errors, no manual switching.

Built for Compliance

Restrict routing to the providers and regions you approve. Ideal for teams with data-residency or vendor requirements.

Pro & Max

Direct Pin

Already know exactly which model you want? Pin it directly with an ozore/* alias (for example ozore/opus-4.7 or ozore/gpt-5.4) and your request goes straight to that model, every time. No classification, no routing decisions.

Pin Any Model

Use ozore/* aliases to lock to a specific model on a per-request basis. Mix pinned calls and auto-routed calls freely.

Automatic Failover

If your pinned model is down or you run out of premium tokens, the Smart AI Router steps in so you still get a response instead of an error.

No Setup Required

Nothing to configure. Just change the model name in your client to an ozore/* alias whenever you want a specific model.

Simple pricing. Real savings.

Every plan includes unlimited tokens on standard models, full-length responses, and no time-based limits. Your premium token pool scales with your plan, and Ozore's Smart AI Router and compaction engine make it last.

Need more tokens mid-month? Buy extra usage at discounted rates anytime. It never expires.

Starter
$9/mo
For getting started
6M premium tokens/mo
  • 6M premium tokens/mo
  • Unlimited tokens on standard models
  • Smart AI Router
  • 1 agent slot
  • 50K context window
Basic
$18/mo
For daily AI-powered development
13M premium tokens/mo
  • 13M premium tokens/mo
  • Unlimited tokens on standard models
  • Smart AI Router
  • 3 agent slots
  • 80K context window
Most Popular
Pro
$32/mo
For serious AI workflows
25M premium tokens/mo
  • 25M premium tokens/mo
  • Unlimited tokens on standard models
  • Custom router
  • Direct model access
  • 2 API keys included
  • 4 agent slots
  • 100K context window
Max
$50/mo
For demanding multi-agent workflows
40M premium tokens/mo
  • 40M premium tokens/mo
  • Unlimited tokens on standard models
  • Custom router
  • Direct model access
  • 2 API keys included
  • 20% discount on additional premium tokens
  • 7 agent slots
  • 135K context window

Compare plans

Every plan includes unlimited tokens on standard models and full-length output.

StarterBasicProMax
Standard model requestsUnlimitedUnlimitedUnlimitedUnlimited
Premium tokens/mo6M13M25M40M
Agent slots1347
Context window50K80K100K135K
Smart AI Router
Custom Router
Direct model access
API keys included1122
Extra token discount20%
Who Ozore is for

Ozore is built for developers, personal AI agents, and hobbyists. It is not for consumer-facing apps or multi-person use.

Each environment needs its own API key. Starter and Basic include one; Pro and Max include two, and you can add more as you scale.

Building something consumer-facing, or need multiple seats across an enterprise? Contact us for custom pricing.

Built for multi-agent workflows

Run a fleet of agents in parallel, each with its own isolated session, all on one account.

All plans

Parallel Agent Slots

Up to 7 agents running side by side on Max. Each ozore/auto1…autoN slot keeps its own session state and routing decisions. No cross-talk, no shared limits.

Max + add-on

Multiple API Keys

Max includes 2 keys; every plan can add more for $5 each (+1 agent slot per key). Give each environment, repo, or teammate its own isolated key.

All plans

Isolated Sessions

Every agent slot keeps its own routing state, session affinity, and KV cache. One agent never interferes with another.

Add-ons

Customize your plan with optional extras.

Extended Context$5/mo

Doubles your plan's context window, up to 270K tokens on Max.

+2 Agent Slots$10/mo

Run more AI agents in parallel for multi-agent workflows.

Zero Data Retention$5/mo

We won't train on your data, and we won't send it to any model that trains on it.

Extra API Key + Agent Slot$5 once

One additional API key and +1 agent slot. Stackable.

Extra Premium Tokens$12.50/10M

Buy additional premium tokens anytime. Never expires. Consumed only after your monthly allocation runs out.

Drop-in compatible

OpenAI-compatible API. One key. Setup in under a minute.

Claude Code
OpenClaw
Hermes
Kilo Code
Pi
Cursor
Cline
Roo Code
Windsurf
Custom scripts
+ anything that calls an API

Questions

Everything you need to know about token plans.

It depends on your workload, but the math is simple: most agent requests (file reads, small edits, lookups, planning steps) run great on standard models, which are free with unlimited tokens. Ozore's Smart AI Router routes those automatically and only spends premium tokens on the requests that genuinely need a frontier model. On top of that, the compaction engine trims bloat from every request so even the premium calls use fewer tokens. The more routine work in your sessions, the more you save. Pick "Save" mode to push savings further, or "Premium" mode if you want top models on everything.

Start saving on your AI coding today.

Unlimited tokens on standard models. Premium tokens for the rest. One API key, set up in under a minute, and cancel anytime. No contracts.