Kevin Smith
6 min read • 18 June 2026
🔗 Originally published on LinkedIn
For the last few years, AI has mostly been discussed as a model story.
Which model is smartest? Which one has the biggest context window? Which one writes the best code, answers the hardest questions, passes the toughest benchmarks, reasons most convincingly, or produces the least embarrassing hallucinations?
That made sense for a while. When a new technology arrives, we tend to stare at the brightest object in the room. With AI, that bright object has been the frontier model. GPT. Claude. Gemini. Grok. The race has been loud, expensive, impressive, and occasionally ridiculous.
But I think something else is starting to happen now. More quietly. Beneath the noise.
AI is beginning to move from a model era into an infrastructure era.
That might sound like a subtle shift. It is not. It changes how we think about cost, architecture, governance, security, software development, and the future shape of the technology industry itself.
The best analogy I have is cloud.
When cloud first arrived, the mental model was simple. Instead of buying servers, you rented them. That was the story. You moved from physical machines in a data centre to virtual machines running somewhere else. It was cleaner. Faster. More flexible. But it was still mostly infrastructure as we already understood it.
Then the layers started to appear. Object storage. Managed databases. Queues. Search services. Identity. Monitoring. Platform as a service. Serverless. Containers. Kubernetes. Private networks. Infrastructure as code. Cloud-native security. FinOps. Platform engineering.
Cloud stopped being “someone else’s computer” and became a whole operating environment for modern software.
That is where I think AI is heading.
Right now, we are still in the slightly chaotic phase. Everyone is experimenting. Everyone is building demos. Everyone is wrapping chat interfaces around internal documents and calling it transformation. Some of it is genuinely useful. Some of it is pointless. Most of it is somewhere in between.
But the shape of the next phase is becoming clearer.
The future is not one giant model answering every question. It is not a single magic brain in the sky. It is a stack. A messy, useful, evolving stack of services, models, agents, routers, gateways, memory systems, context tools, security layers, observability platforms and cost controls.
In other words, intelligence is becoming infrastructure.
You can already see the pressure building. Frontier models are no longer cheap toys. The companies building them are charging proper money because the cost of training and serving them is enormous. At the same time, enterprises are waking up to the fact that AI usage is not free, not simple, and not always easy to govern.
That changes the game plan.
You do not use a model capable of helping with frontier mathematics or drug discovery to summarise a meeting note. You do not send every classification task to the most expensive reasoning model available. You do not burn huge token budgets asking a giant model to rummage through a codebase when a smaller, more specialised model can do the searching first.
That would be like spinning up a GPU cluster to resize an image.
Technically possible. Architecturally absurd.
This is why Microsoft’s FastContext release caught my attention. On the surface, it is a relatively small thing: a lightweight repository-exploration model for coding agents. Its job is not to be the smartest coding model in the world. Its job is to scout ahead. It searches the code, reads files, identifies relevant locations, then hands back compact file paths and line ranges to the main agent.
That is a very different pattern.
The main model no longer has to do all the wandering. It does not have to fill its context window with exploratory noise. It receives cleaner evidence and can focus on the harder work of reasoning, planning and fixing.
That feels like a glimpse of where this is going.
We are moving from “ask the model” to “design the system around the model”. That system may include a large reasoning model, but it will also include smaller models doing smaller jobs. One model may classify intent. Another may retrieve context. Another may compress a prompt. Another may check for sensitive data. Another may act as a local privacy filter. Another may evaluate the output before it reaches a user.
The expensive model becomes one part of a wider machine.
This is where small language models start to become interesting again.
A year or two ago, it was tempting to say the future of AI might belong to small models (I actually did!). Then the frontier models kept improving, and that claim started to look naive. Bigger models were better at reasoning. Better at coding. Better at handling ambiguity. Better at general-purpose intelligence.
But maybe the original instinct was not wrong. Maybe it was just incomplete.
Small models do not need to replace large models to matter. They can surround them.
They can route. Filter. Scout. Summarise. Classify. Compress. Translate. Validate. Redact. Monitor. Evaluate. They can sit close to the data, close to the user, close to the workflow, doing bounded work quickly and cheaply.
The large model then does what it is actually good at.
This is a more mature way to think about AI. It is also a more enterprise way to think about AI. Because once AI moves beyond experimentation, the questions change. It is no longer just “Can this model do the task?” It becomes “Can we run this safely? Can we afford it? Can we explain it? Can we audit it? Can we route it? Can we improve it? Can we stop it doing the wrong thing? Can we make it reliable enough to sit inside an important business process?”
Those are infrastructure questions.
That is why model gateways and proxies are becoming important. Tools like LiteLLM point towards a world where organisations do not hard-code themselves to one provider or one model. They introduce a control plane. Requests can be logged, routed, analysed, budgeted and governed. Different teams can use different models. Different workloads can have different policies. Spend can be tracked. Prompts can be inspected. Access can be limited.
That sounds boring compared with a frontier model demo.
It is not boring.
It is what happens when a technology grows up.
The same thing happened with cloud. The early excitement was elastic compute. Then everyone discovered that identity mattered. Networking mattered. Cost management mattered. Deployment pipelines mattered. Observability mattered. Governance mattered. The surrounding ecosystem became as important as the raw compute itself.
AI will follow the same pattern, but faster.
There will be an AI FinOps layer because token spend is real money. There will be prompt security because prompt injection is not a theoretical problem. There will be context infrastructure because long context is useful, but careless context is expensive and dangerous. There will be memory systems because useful agents need continuity. There will be orchestration layers because multiple agents and models will need to collaborate. There will be evaluation platforms because you cannot improve what you cannot measure. There will be local inference because privacy, latency and cost will sometimes matter more than raw capability.
This is Cloud 2.0.
Not because AI replaces cloud. It will run on cloud, of course. But because AI creates a new infrastructure layer on top of cloud. A layer where intelligence becomes something we provision, route, monitor, secure and compose.
That is the shift.
Cloud turned compute into programmable infrastructure. AI is turning intelligence into programmable infrastructure.
Once you see that, a lot of things start to make more sense.
The rise of agents makes more sense. Agents are not just chatbots with tools. They are software processes that can interpret goals, use services, gather context, take actions, and coordinate work over time. That requires infrastructure.
The renewed interest in local models makes more sense. Not every workload belongs in a frontier API. Some tasks need to run near the data. Some need to be cheap. Some need to be private. Some need to be fast. Some simply do not justify the cognitive firepower of a frontier model.
The obsession with context makes more sense. Context is becoming a resource. It has cost. It has risk. It has quality. We will need systems that decide what context is relevant, what should be compressed, what should be remembered, and what should never be sent to a model in the first place.
The growth of model routing makes more sense. A modern AI application will not ask “Which model do we use?” in the singular. It will ask which model, for which task, under which policy, at which price, with which latency, and with which level of confidence.
That is a different architectural world.
And we are early.
Very early.
That is easy to forget because AI already feels huge. The products are everywhere. The headlines are relentless. The investment is enormous. The fear is real. The job anxiety is real too, and we should not dismiss it. Any technology that turns intelligence into infrastructure is going to change the shape of work. Some of that change will be exciting. Some of it will be painful. Some of it will be badly handled.
But if we step back from the noise, we can also see something remarkable starting to form.
A new software ecosystem is being born.
Not just new apps. New primitives. New platforms. New operating models. New categories of tooling. New patterns for designing systems. New ways to think about what software is.
For years, software has been mostly deterministic. We wrote code. The code executed. It did what we told it to do, assuming we had told it correctly.
Now we are introducing systems that interpret, infer, reason, retrieve, remember, collaborate and adapt. That does not remove the need for engineering discipline. It increases it. The more intelligent the system becomes, the more important the surrounding architecture becomes.
The magic needs machinery.
That is perhaps the simplest way to say it.
The first wave of AI made the magic visible. The next wave will build the machinery around it.
This is where the real work begins. Not in chasing every new model release. Not in pretending every business process needs a chatbot. Not in assuming the smartest model is always the right model.
The real work is learning how to design systems where intelligence is used deliberately. Efficiently. Safely. Economically. In the right places. At the right scale.
That is Cloud 2.0.
And if the cloud journey taught us anything, it is that the early abstractions are rarely the final ones. We are still inventing the language. Still discovering the patterns. Still finding out which parts become platforms, which become commodities, and which become the new foundations of how software is built.
Fifteen years from now, we may look back at this moment the way we now look back at early cloud. Necessary. Exciting. Slightly naive. Full of promise, but missing most of the machinery that would eventually make it real.
That machinery is starting to appear now.
FastContext is one small example. Model gateways are another. Context compression, local inference, agent orchestration, AI observability, memory, evaluation, guardrails and cost controls are all part of the same story.
The story is not that one model wins.
The story is that intelligence becomes a stack.
And once intelligence becomes a stack, everything built on top of it starts to change.
This article was originally written and published on LinkedIn by Kevin Smith, CTO and founder of Dootrix.
Kevin Smith