A few weeks ago, the team at PostHog pointed an AI agent at their query engine, fed it slow queries from production, and let it run overnight. By morning it had found something that had been sitting there for almost three years: every query with a timestamp filter had quietly stopped using the database's primary key the way it should. The fix cut the data the database had to scan on the benchmark query by 62%.
The bug isn't the part I keep thinking about. The line that caused it wasn't a mistake. When they added per-team timezones back in 2023, wrapping every timestamp so the dates displayed correctly was the right call. It only became a problem slowly, as the system grew up around it. And nobody went back to it, because it worked — and because, as their engineer put it, it had become "the kind of code you stop seeing."
The agent noticed it for a fairly unglamorous reason. It had no history with the code. It looked at a line that had been there for three years exactly the way it looked at one written yesterday, with no particular reason to trust either.
I don't think this is really a story about databases.
It's about a kind of debt we don't have a good name for. Call it judgment debt: the slow cost of decisions that were right when you made them, and that no one has looked at again as the world underneath them changed.
Technical debt, at least, announces itself. You can feel it — the file nobody wants to open, the migration you keep putting off. There's a discomfort that tells you it's there. Judgment debt has none of that. The decision was sound. The reasoning was good. Nothing about it asks to be reconsidered.
That's what lets it accumulate so quietly. It doesn't feel like debt. It feels like something you settled a long time ago.
Technical debt makes the code expensive to change. Judgment debt makes you sure of something that quietly stopped being true.
The PostHog bug lasted as long as it did for three reasons, and I think all three show up in product work too.
Slow, not loud. It never paged anyone. It just made every query a little worse than it needed to be. Judgment debt behaves the same way. It doesn't break the product. It nudges a team toward building slightly the wrong thing, with complete conviction. Nothing sounds an alarm. The work still ships.
No baseline. Every query was affected, so there was nothing healthy to compare against. The same is true of a belief a whole team shares. When everyone takes the goal you set eighteen months ago for granted, no one in the room finds it strange. Agreement and blindness look identical from the inside.
The evidence is somewhere no one looks. The thing that would have revealed the bug lived in the output of a command nobody runs unless they're already suspicious. Product work has its own version of that: the early research, the support tickets, the usage data that would show the outcome you're still optimizing for stopped mattering a couple of quarters ago. It's all there. We're just not looking, because nothing has told us we should.
This is a different failure than the ones I've written about before.
A spec written from memory is wrong from the start — it never really grounded itself in what already existed. Judgment debt is almost the opposite. The spec was right at the start. It earned its approval honestly. It just didn't stay true.
And it isn't only about someone new undoing a decision they were never told about. That's a gap in what got passed along; the knowledge existed, it just didn't travel. Judgment debt is quieter than that, and a little more uncomfortable, because the people who made the call are often the ones who can see it least. They remember why it was right. And the memory does the work that fresh evidence should be doing.
For most of the history of software, judgment debt was held in check by how slow building was. When a feature took six weeks to ship, six weeks of reality leaned on the idea before it ever reached a person. The slowness was a kind of accidental review. You revisited decisions because you kept walking past them on the way to done.
Agents take the walk away. An approved spec can be on its way to a pull request the same afternoon. The assumption inside it never gets tested by time. It just gets built.
And the agent won't catch it for you, because that was never its job. It won't stop on the second outcome to wonder whether it still matters. It won't notice that a constraint you wrote last year quietly contradicts a ticket that came in last week. It builds what you asked for, faithfully and fast. That faithfulness is the whole point of it.
Faithfulness to a stale idea is still faithfulness. We've just made it much faster to act on.
There's a more hopeful way to read this, though.
The same quality that let the agent find a three-year-old bug — no history, no deference to how long a line had been sitting there — is exactly what you'd want turned toward your own intent. Not to find slow queries, but to ask the questions a team stops asking on its own. Which outcome have we never actually checked since we wrote it? Which edge case quietly became the main case? Which constraint made sense under last year's evidence, and has anyone looked since?
Something with no history might be the most honest reviewer of judgment a team can have, simply because it's the one reviewer that doesn't share the blindness.
But there's a catch, and it's really the whole thing. The agent could look at that timezone line because it was written down — as code, somewhere a reader could find it and weigh it again. You can't look again at judgment that only ever lived in someone's head. There's nothing there to point anyone at.
This is the part the instinct to "keep judgment human" tends to miss.
Getting intent out of your head and into something that outlasts the conversation — the objective, the outcomes, the constraints, the evidence that led to them — isn't only so the next person can read it. It's so that, later, you can find the places where it quietly stopped being true.
The practice is almost embarrassingly simple. Write down what you decided, and why. Then, every so often and without much ceremony, go back and ask which of it you still believe.
You can't pay down a debt you can't see. And judgment is the one kind most of us never wrote down to begin with.
The PostHog bug took three years and an offsite to find. It didn't have to. It was visible the entire time. It just lived in a place everyone had stopped looking.
Every product has a few of these. A decision that was right once, that no one really sees anymore, that someone with fresh eyes would notice in an afternoon — if there were only somewhere to look.
Most of the work, I think, is just leaving ourselves something to look at.
Don't Just Write Code. Define Intent.
Turn user friction into structured Intent Specs that drive your AI agents.
Get Started for Free