Air Max: every AI model, working together in one answer
Air Max is AIR's hybrid engine. It routes each message to the best model — and on the hardest tasks, runs several models together and merges them into one superior answer.

No single AI model is best at everything. One is sharpest at reasoning, another handles enormous documents without losing the thread, another is fastest for quick replies, another is strongest at fresh live-web research. For years the trade-off was forced on you: pick one model and live with its weak spots. Air Max removes that trade-off. It is not another model — it is a conductor that puts every model to work on the part of your request each one does best.
This article explains exactly how Air Max decides what to do with each message, when it uses one model versus several, and why combining models can produce a better answer than any of them alone — while still staying cost-efficient for the everyday questions that don't need the full orchestra.
Two modes, chosen automatically
Air Max works in two modes and switches between them by itself, per message. The first is the Router. For the vast majority of messages — a quick question, a rewrite, an image request, an everyday explanation — Air Max simply picks the single best engine for that specific task and answers with it. That keeps things fast and cheap: one model, one answer, no waste.
The second is the Ensemble. When a message is genuinely hard — deep research, complex analysis, a multi-part strategic ask — Air Max runs more than one model and merges their work. A fast, long-context model drafts broad coverage of the problem; a deep reasoning model then verifies that draft, corrects mistakes, fills gaps, and elevates it into one final, polished answer. You see a single response — but two engines shaped it.
| Your message | Mode | What runs |
|---|---|---|
| Quick question / rewrite | Router | One fast Flash engine |
| Everyday explanation | Router | Balanced Flash model |
| Code / focused analysis | Router | Gemini 3.1 Pro |
| Deep research / strategy | Ensemble | Gemini 3.1 Pro + Claude Sonnet 5 |
| Document creation | Router | Claude Sonnet 5 (doc-grade) |
Why combining models actually helps
Two models help for the same reason two experts help: they catch each other's blind spots. A model that is excellent at breadth may state something confidently that a deeper reasoning model would flag as wrong. When the second model treats the first's output as research to verify rather than truth to repeat, errors get caught before they reach you, and the final answer carries the strengths of both — coverage plus rigor.
This is why the ensemble is reserved for the hard 20% of tasks. On a simple question there is nothing to reconcile, so running several models would only add cost and latency for no gain. On a complex one, the cross-check is exactly where the quality comes from.
Powerful, but still cost-efficient
The instinct with a 'use every model' feature is to assume it must be expensive. Air Max is designed to be the opposite for normal use. Because it routes most messages to a single, appropriately-sized engine, everyday questions cost about what they would on a mid-tier model — not a flagship. The multi-model ensemble, which is the pricier path, only fires when a task is genuinely worth it.
Billing is reconciled against the exact engines that actually ran, so you are charged for the real work done and nothing more — while AIR keeps a consistent margin on every message. In practice: Air Max gives you flagship-grade answers on the questions that need them, and cheap, fast answers on the ones that don't, without you ever having to switch engines yourself.
