Every request is classified in real time: complexity, category, context length, agentic depth. The router picks the cheapest model that can handle the job well. Simple question? Free standard model. Complex architecture review? Premium model, charged from your token pool. You never configure anything.
Coding, writing, analysis, agentic orchestration: the router detects what you're doing and picks the right model tier. Session affinity preserves KV cache for multi-turn conversations.
Long agentic sessions accumulate bloat: duplicate tool results, stale code blocks, base64 blobs. Ozore's compaction engine strips it automatically before each request, cutting input tokens without losing any meaningful context. Fewer tokens in = lower cost per call.
If the primary model is down, the router falls through a ranked chain of alternatives. Your request always gets a response. You never see a provider outage.
If the router detects repeated failures or user frustration, it automatically escalates to a more capable model. No manual intervention needed.
Ozore's Smart AI Router has three modes. Switch anytime from your dashboard.
Maximize your token pool. The router sticks to standard models whenever possible and only escalates to premium when strictly necessary.
The default. Ozore picks the best model for each request, using premium tokens when quality demands it.
Spend freely for maximum quality. The router prefers premium models even for tasks that standard models could handle.
Two tiers of models. One simple system.
Always free. Always unlimited.
MiniMax M3, Mimo V2.5 Pro, Gemini 3.1 Flash, Deepseek V4 and more. These handle the vast majority of coding, writing, and general tasks.
Frontier power, metered by tokens.
GPT-5.4, GPT-5.5, Sonnet 4.6, Opus 4.7, Fable 5, Gemini 3.5 Flash, Grok Build, and more. Used when you need the absolute best quality.
Most agent requests don't need a frontier model. Ozore's Smart AI Router knows the difference, so you stop paying premium rates for work that standard models do just as well.
Your agent fires off hundreds of calls a session: file reads, small edits, lookups, planning. Run them all on Opus or Sonnet and you pay top-tier rates for every single one, even the trivial ones.
The same session, classified request-by-request. Routine calls go to unlimited free models. The hard ones (architecture, tricky debugging, final review) get a premium model from your token pool. And the compaction engine trims bloat from every request, so even premium calls use fewer tokens.
Long agentic sessions balloon with duplicate file reads, stale code blocks, base64 blobs, and tool output noise. Ozore's compaction engine strips all of it automatically before the request hits the model, keeping recent context and summaries intact. You get the same quality answers with significantly fewer input tokens billed.
Your exact savings depend on your workload and which router mode you pick. The more routine work in your sessions, the more you save. Standard models are always free, with unlimited tokens.
Want full control? Build your own router on Ozore. Choose 3+ models from any provider and the router only uses your picks, with the same fallback chain, compaction engine, and session affinity behind it. Perfect for compliance, latency control, or personal preference.
Choose from 20+ models across OpenAI, Anthropic, Google, xAI, DeepSeek, and more. Mix standard and premium freely.
If your first-choice model is down, the router tries your other picks in order. No errors, no manual switching.
Restrict routing to the providers and regions you approve. Ideal for teams with data-residency or vendor requirements.
Already know exactly which model you want? Pin it directly with an ozore/* alias (for example ozore/opus-4.7 or ozore/gpt-5.4) and your request goes straight to that model, every time. No classification, no routing decisions.
Use ozore/* aliases to lock to a specific model on a per-request basis. Mix pinned calls and auto-routed calls freely.
If your pinned model is down or you run out of premium tokens, the Smart AI Router steps in so you still get a response instead of an error.
Nothing to configure. Just change the model name in your client to an ozore/* alias whenever you want a specific model.
Every plan includes unlimited tokens on standard models, full-length responses, and no time-based limits. Your premium token pool scales with your plan, and Ozore's Smart AI Router and compaction engine make it last.
Need more tokens mid-month? Buy extra usage at discounted rates anytime. It never expires.
Every plan includes unlimited tokens on standard models and full-length output.
| Starter | Basic | Pro | Max | |
|---|---|---|---|---|
| Standard model requests | Unlimited | Unlimited | Unlimited | Unlimited |
| Premium tokens/mo | 6M | 13M | 25M | 40M |
| Agent slots | 1 | 3 | 4 | 7 |
| Context window | 50K | 80K | 100K | 135K |
| Smart AI Router | ✓ | ✓ | ✓ | ✓ |
| Custom Router | — | — | ✓ | ✓ |
| Direct model access | — | — | ✓ | ✓ |
| API keys included | 1 | 1 | 2 | 2 |
| Extra token discount | — | — | — | 20% |
Ozore is built for developers, personal AI agents, and hobbyists. It is not for consumer-facing apps or multi-person use.
Each environment needs its own API key. Starter and Basic include one; Pro and Max include two, and you can add more as you scale.
Building something consumer-facing, or need multiple seats across an enterprise? Contact us for custom pricing.
Run a fleet of agents in parallel, each with its own isolated session, all on one account.
Up to 7 agents running side by side on Max. Each ozore/auto1…autoN slot keeps its own session state and routing decisions. No cross-talk, no shared limits.
Max includes 2 keys; every plan can add more for $5 each (+1 agent slot per key). Give each environment, repo, or teammate its own isolated key.
Every agent slot keeps its own routing state, session affinity, and KV cache. One agent never interferes with another.
Customize your plan with optional extras.
Doubles your plan's context window, up to 270K tokens on Max.
Run more AI agents in parallel for multi-agent workflows.
We won't train on your data, and we won't send it to any model that trains on it.
One additional API key and +1 agent slot. Stackable.
Buy additional premium tokens anytime. Never expires. Consumed only after your monthly allocation runs out.
OpenAI-compatible API. One key. Setup in under a minute.
Everything you need to know about token plans.
Unlimited tokens on standard models. Premium tokens for the rest. One API key, set up in under a minute, and cancel anytime. No contracts.