Codex: Lessons from the Trenches

I have not done serious hands-on coding in years.

As CTO of a growing software company, most of my time is consumed by strategy, delivery, hiring, leadership, client work, commercial decisions, P&L, and the endless stream of context-switching that comes with running a business. I still live and breathe software, but not in the way I used to. I am close to the work, but not in it. I know what good looks like, I know how to think about systems, I know the trade-offs, the patterns, the failure modes. I just do not spend my days writing code.

And then Codex happened.

Codex, specifically the macOS app, is an agentic coding assistant. It feels less like an IDE with some agentic bits bolted on and more like an engineering partner that can take on meaningful chunks of work, reason about a codebase, implement features, refactor aggressively, and run long-running tasks while you go and do other things.

What Codex unlocked for me was something I did not expect: it let me use all of my accumulated engineering judgement without needing to return to being a full-time implementer. I could be the architect again, but also the builder. Not by doing the work myself, but by orchestrating the work.

That word, orchestrating, is the right one. I was not “pair programming” with Codex. I was steering and directing it at a higher level.

And the result is that in a single week, between meetings, cups of tea, and a few stolen weekend hours, I built something that would previously have taken me months and months.

This is not a toy project

The project I used as my proving ground was an IoT application that integrates with Google’s Smart Device Management API. On the surface, it sounds simple: read temperature and humidity from a thermostat, set the target temperature, maybe show a nice dashboard.

But what I actually built was far more ambitious.

The system monitors the house over time and learns its thermal characteristics. It runs statistical modelling that can infer things like how quickly the house heats up or cools down, how much solar gain is affecting internal temperature, how long it will take to reach a given target temperature, and whether something unusual is happening that might indicate an anomaly. It can forecast heat-up and cool-down curves, project future states, and surface insights that look suspiciously like the kind of intelligence you get from commercial smart thermostat products. And it is not just running on my laptop in a folder called “side-project”.

It is deployed properly. It sits on Google Cloud. It is provisioned with Terraform. It has a full deployment pipeline. It has automated testing. It has scheduled security and quality audits that run overnight and produce reports I can review in the morning.

In other words, this is not a toy. It is a distributed, data driven, cloud native system with machine learning at its core.

On my own, this would have taken months. I would have had to go deep into machine learning theory, tooling and experimentation before I could even begin to trust the model. With Codex, it took a week. Fitted around meetings, cups of tea and a few focused hours at the weekend.

In its current state, it is not at "demo/prototype" level anymore. It is real work, producing real artefacts, under real constraints.

The shift from builder to orchestrator

This is the part I think most people are missing.

Codex does not magically turn non-technical people into software engineers. But it absolutely changes what it means to be a technical leader. It creates a new mode of work where your value is no longer tied to how quickly you can type, or how many frameworks you have memorised.

Instead, the critical skill becomes clarity.

If you know what you are building, and you understand the architecture, and you have the instincts to smell a bad decision early, then Codex becomes an amplifier. It takes all the knowledge you have accumulated and turns it into execution velocity.

You stop being the person laying bricks and start being the person directing the build.

And for someone like me, who has been trapped for years in the paradox of being highly capable technically but permanently time-poor, it feels like being given access to a parallel version of myself. The version that still codes. The version could still ship.

Lesson #1: your repo must be self-describing

The first hard lesson is that an agent is only as good as the world you build around it.

If your repository is vague, inconsistent, or under-documented, the agent will behave like a new developer who has joined your team without onboarding. It will make assumptions. It will miss context. The solution is not to micromanage every prompt. The solution is to create a project that explains itself.

The anchor of that is AGENTS.md. This is the file that Codex (or any of the agentic coding tools) wakes up to every time you start a conversation.

AGENTS.md is not just a list of rules or a style guide. It is the starting point for how the agent understands the project. It needs to define the conventions, the structure, the tooling, the testing approach, and the architectural intent.

But more importantly, AGENTS.md should not try to contain everything. It should be a map.

The most powerful pattern I found was to treat AGENTS.md as an index into a deeper /docs folder. A folder that contains architecture notes, design decisions, feature specs, trade-offs, known gotchas, and any “why we did it this way” thinking that normally lives in your head. For example, my AGENTS.md file explicitly tells codex to record durable knowledge and learning outcomes from each round of feature development.

The moment you do that, the agent stops feeling like an unpredictable black box and starts behaving like a teammate who has read the wiki. And the moment you stop relying on the agent’s memory and start giving it written context, the quality of output jumps dramatically.

Lesson #2: spec-driven development is a superpower

The second lesson is that some form of spec-driven development is not optional anymore. I have written about this a few times already. (see: A New SDLC for the Agentic Era)

This is not because I suddenly love documentation. I do not. It is because it is now 1) cheap to produce and 2) critical to success; agents thrive on stable intent.

My workflow became very structured. For each feature, I started a new Codex thread. I began in planning mode. I scoped the feature properly, iterated until it felt crisp, and only then did I move into implementation.

Every feature would produce an evolving set of artefacts, typically a spec.md and a design.md, living in a dedicated feature folder. Over time, this becomes a living specification of the entire product.

This is one of those things that feels slightly overkill at first. Then you hit feature five or six and you realise what is happening. You are not only building the software, you are building a knowledge system too. You are creating a project that can explain itself, to both humans and also to agents.

That is when it starts to get pretty cool.

You can generate derivative documentation easily. You can create test plans. You can generate onboarding notes. You can keep architecture coherent. And when the agent returns to an older part of the system weeks later, it has a stable reference point that is not buried inside a forgotten conversation thread.

The spec becomes the spine of the project.

Lesson #3: discipline matters, because chaos is contagious

It is very easy to break your own process.

Once you have Codex up and running nicely, and you're in the thick of it, you start to get cocky. You want to make quick tweaks. You start a “small fixes” thread. You do a bit of UI work, then you drift into a refactor, then you decide to tweak an API, and suddenly you have created a chaotic everything-thread that has no clear scope and no stable context.

And the agent becomes less reliable. Sometimes it seems over-eager to please and makes changes to stuff a bit too early. Almost like it starts just... hacking.

It is the same failure mode you get in human teams. When the work becomes unstructured, quality starts to erode. The difference is that with an agent, the erosion happens faster because the agent is optimising for completion rather than organisational coherence.

So the discipline of feature threads, scoped plans, and explicit artefacts is not about process for process sake. It is survival!

Lesson #4: “collaboration handshakes” make agents fun

One of the best things I stumbled into was what I started calling collaboration handshakes.

These are a small conventions we agree in AGENTS.md that defines how we (me + codex) can work together. They are behavioural agreements.

My favourite example is something we called rapid mode.

By default, I wanted Codex to behave like a responsible engineer. Run tests. Run linting. Validate the build. Keep things safe.

But sometimes I am tweaking UI padding, adjusting layout spacing, nudging a margin, or experimenting with a component arrangement while the browser is open on my second monitor. In those moments, I do not want the agent to stop and run the entire test suite after every microscopic change.

So I introduced a simple handshake: when I say “rapid mode”, Codex temporarily stops doing heavy verification steps after each edit. When I turn it off, it goes back to being thorough.

It is a small thing, but it transforms the feel of working with an agent. Suddenly the interaction becomes less like pleading with a machine and more like collaborating with someone who understands when to be careful and when to move fast.

This is where the future gets interesting. These handshakes are basically early glimpses of what agent ergonomics will look like. We will all develop our own dialects.

Lesson #5: automate everything, and let the agents work while you sleep

The fifth lesson is that once you have agentic tooling, you should start thinking like a manager of an engineering team.

Because you now have one.

I created an /audit folder in the project and started scheduling tasks overnight. Code quality reviews. Refactoring recommendations. Security audits. Performance analysis. Test coverage reviews. Dependency checks.

While I sleep, the agents work. In the morning, I wake up to reports. This differs from a standard CI/CD pipeline because it feels very dynamic, insightful and intelligent. It can also do stuff ready for you to review, as well as just report on it.

This changes the rhythm of development. It creates a feedback loop where quality and refinement become continuous, not something you do “later” when you have time.

Lesson #6: autonomy is powerful, but boundaries are essential

I gave Codex full access to my machine early on, because frankly, that is where the real power is. If you constrain it too tightly, you lose the ability for it to install tools, run builds, execute scripts, or behave like a real engineer.

And most of the time, that worked really well. But I did have one moment early on that reminded me that caution is still warrented.

I had started a new project folder but had not initialised version control yet. I asked Codex to continue working on a plan. For whatever reason, it got the wrong end of the stick and decided it needed to locate something that did not exist inside the folder. So it started searching my entire drive.

Suddenly macOS was throwing permission prompts at me. Codex wants access to your desktop. Codex wants access to Apple Music (yes, Apple Music. And no, I have no idea). Codex wants access to random system folders.

I shut it down and we started again.

Nothing catastrophic happened, but the moment was a clear signal: if you give an agent autonomy, you need to give it structure first. Initialise your repo. Create your docs. Set your boundaries. Then unleash it.

Agents are not dangerous because they are malicious. They are dangerous because they eager to please.

Everything is different now...

The real story here is not that I built an IoT platform in a week.

The story is that the bottleneck in software development is shifting. has shifted. i'm not sure. The end of 2026 is going to look very different to the beginning.

For decades, we have treated implementation capacity as the limiting factor. How many developers do you have? How fast can they code? How quickly can you ship? Now, implementation is becoming cheap.

The scarce resource is becoming intent. Clarity. Taste. Architecture. The ability to define what should exist, why it should exist, and what trade-offs are acceptable.

Codex does not remove the need for engineering skill. It amplifies it. It punishes vagueness. It rewards precision. And it creates a world where people like me, technical leaders who have been pushed away from hands-on creation by the sheer weight of leadership responsibility, can suddenly build stuff again. Not by going backwards into the old role, but by evolving into a new one.

If you want a glimpse of where this is going, it looks less like a developer typing code and more like a director shaping a film. You define the scenes, the tone, the pacing, the constraints, the intent. The agents do the production work. You review, refine, guide, and keep the system coherent.

That is the future.

And it is much much closer than most people realise.

👉 A New SLDC for the Agentic Era

This article was originally written and published on LinkedIn by Kevin Smith, CTO and founder of Dootrix.