Kevin Smith
5 min read • 23 April 2025
🔗 Originally published on LinkedIn
We’re at that thrilling moment in the evolution of any technology where the chaos begins to crystallise. Patterns emerge. Standards take root. The stack stabilises just enough for teams to stop firefighting and start building with intention.
If the past two years were defined by prompting hacks, foundation model hype, and the dizzying rise of GPT-style interfaces, the next few years will be defined by architecture—specifically, modular, composable, agentic AI systems.
And right at the heart of that transition? A quietly powerful technique that’s fast becoming foundational: LoRA, or Low-Rank Adaptation.
At first glance, LoRA seems niche. Lightweight. A curiosity from the open-source crowd. But dig deeper and you’ll find it solving core challenges around scalability, specialisation, cost, and control—especially in enterprise AI and real-world agentic applications.
This article will unpack what LoRA is, where it fits, how it differs from other adaptation methods like prompt engineering, RAG, and MCP—and why it might just be the adapter pattern that defines the next phase of intelligent software.
LoRA doesn’t grab headlines like GPT-4 or Sora. It’s not splashy, or magic, or futuristic. It’s just quietly enabling the next phase of intelligent software.
LoRA stands for Low-Rank Adaptation—a method for fine-tuning large models without touching the whole model.
Instead of modifying or duplicating the base model’s billions of parameters, LoRA injects small, trainable matrices—adapters—into specific layers of the frozen model. You train just those lightweight components (~1-5% of the model size), then slot them in when needed.
Think of it as adding a plugin to a giant engine:
LoRA was originally developed as a way to make fine-tuning large language models (LLMs) faster and cheaper. But it turns out to offer something far more powerful: a way to build flexible, domain-specific intelligence on top of shared foundations—without cost, risk, or duplication.
This is the natural follow-up question: Why do we need LoRA when we already have:
Aren’t those enough? Well no, not really: LoRA doesn’t replace those techniques. It goes deeper. Let’s compare:
Prompting is the frontend. RAG is the memory. MCP is the router. LoRA is the personality module.
Prompting is great for quick behaviour shifts:
But it’s shallow. It doesn’t truly change the model’s reasoning, instincts, or latent knowledge. It’s like changing the vibe, not the brain.
RAG adds documents, databases, or vector stores that the model can refer to on the fly. It’s perfect for up-to-date facts or long-tail knowledge.
But again, it doesn’t touch the model’s actual parameters. The model can quote a document, but it can’t internalise a new skill.
The Model Context Protocol allows agents and systems to inter-operate intelligently: “This task needs legal tone + confidential data + special tooling.” It’s a coordination layer.
But MCP doesn’t train behaviour. It points to the right tool or context—it doesn’t create the behaviour itself.
Prompting is the frontend. RAG is the memory. MCP is the router. LoRA is the personality module.
Now we get to the good stuff—what LoRA unlocks that nothing else does as well.
Imagine building AI services for 500 clients. You want each to:
Prompt templates won’t scale. RAG can’t encode tone or decision-making nuance. LoRA gives you a neat, portable adapter that is the client’s brain. It’s:
You can serve everyone from a single base model—and just hot-swap their custom LoRA on demand.
LoRA gives us a way to adapt the model’s internal behaviour without touching the foundation. It’s where you encode personality, domain expertise, or new reasoning patterns.
This is huge. LoRA can inject entirely new capabilities:
You don’t need to retrain the full model. You just train a LoRA on your domain examples. This is especially valuable when the base model:
This is where LoRA and agentic computing collide beautifully. Imagine an autonomous agent navigating a complex workflow:
Boom. Instant domain capability. No retraining. No downtime. This turns LoRA into runtime skill modules—plug-ins for cognition.
In the future, agents might maintain LoRA "skill trees", swap modules in and out, or even train their own adapters over time. You get dynamic capability injection, with guardrails and governance.
LoRA isn’t just convenient. It’s strategically sound:
This is why enterprise AI platforms are moving toward “LoRA-first” architectures:
It’s clean, scalable, and secure.
If you zoom out and look at where LoRA fits in the broader AI architecture, a clear layered stack starts to emerge. Each layer has its own role, tools, and responsibilities—and LoRA is carving out its space as a critical middle layer.
At the very top, you’ve got the UX layer—the interface where users interact with AI. This is where prompt engineering lives: carefully crafted inputs designed to guide the model's behaviour without changing its internals.
Beneath that is the memory layer. This is where systems like RAG (Retrieval-Augmented Generation) operate, bringing in external documents, knowledge bases, or vector stores at runtime. These help the model respond more accurately and with up-to-date context, especially when its training data is out of date or incomplete.
Then comes the coordination layer. This includes tools like the Model Context Protocol (MCP), which allows agents to manage task routing, handle structured metadata, and make smart decisions about what to do next and how to do it. It’s the orchestration layer for multi-agent systems and dynamic workflows.
Now, right below that, we find the emerging adaptation layer—and this is where LoRA lives.
LoRA gives us a way to adapt the model’s internal behaviour without touching the foundation. It’s where you encode personality, domain expertise, or new reasoning patterns. If the UX layer is the voice, and the memory layer is the knowledge, then the adaptation layer is the mindset—the way the model thinks and behaves in different contexts.
At the bottom of the stack is the foundation layer—the large language model or diffusion model itself. This is the heavy engine doing the core computation and general reasoning. Crucially, this base remains frozen in most real-world use cases, especially for safety, cost, and performance reasons.
So, in this layered view:
It’s in this stack that LoRA stands out as a missing link—a flexible, reusable middle layer that allows for specialisation, control, and modularity without retraining or infrastructure sprawl. And as AI architectures mature, it’s this kind of clarity and modular separation that will enable teams to build at scale, with confidence.
LoRA doesn’t grab headlines like GPT-4 or Sora. It’s not splashy, or magic, or futuristic. It’s just quietly enabling the next phase of intelligent software.
It’s how we move from:
At Dootrix, and in the agentic future we’re helping shape, LoRA is going to be one of the tools we reach for most.
Because in a world of powerful general models, the real edge will come from the specific.
And LoRA is how we make AI specific—at scale, safely, and with elegance.
This article was originally written and published on LinkedIn by Kevin Smith, CTO and founder of Dootrix.
Kevin Smith