LLMs and performative productivity

Published: June 5, 2026
Last updated: June 13, 2026

At first, when I began to use AI1 agents day-to-day, I was blown away by their capabilities, and by how much they elevated my own. Certainly, the floor was raised. I could do far more, much faster. That much was undeniable.

Armed with this newfound power, I accomplished a flurry of tasks I either wasn’t capable of, or didn’t have time for previously:

At work, I could get up and running in new codebases without asking for help, and could contribute to them much more easily—especially when they were in unfamiliar programming languages
I got some projects updated, moved, or refactored, all in record time (including a particularly gnarly Nuxt upgrade I’d been putting off for years)
I added several new features to a handful of apps here and there that I wouldn’t have otherwise
I scaffolded new things and built out greenfield projects in record time
I wrote more tests, faster than ever
I pushed out a whole bunch of bug fixes

That all sounds fantastic, of course. It felt fantastic.

But when I got done with all that, I had to wonder: could I really call all of that productive?

At work, I didn’t understand the codebases I was working in, and though I was contributing to them, I gained no real context about them. I was opening PRs, but I couldn’t really speak well to what was in them. I was constantly afraid I’d messed something up without realizing it, and I learned virtually nothing about the unfamiliar languages
Most of the other updates weren’t really needed. The changes just made me feel good, while making little to no difference on the user side of the software. (And even though I’d migrated to the newest version of Nuxt, all my Nuxt knowledge was still out of date.)
The new features weren’t actually being used
The greenfield projects were quickly abandoned
I didn’t really need the tests that bad, and I wasn’t quite sure whether they were doing anything worthwhile in the first place
I didn’t know what had caused the bugs, or what the fixes had been

I had mainly just checked off a bunch of old to-dos, most of which were unfinished because they never mattered that much in the first place.

And even where they did matter, I paid a price for doing more, faster. I added a bunch of abandoned side projects to the old pile, but unlike before, I didn’t even come away with any new skills or experience.

If anything, it seemed like I knew less than before.

Maybe the codebase improved, but I sure didn’t.

And that’s all when the agent worked well. Other times, I’d spend so long prompting and re-prompting it would’ve just been faster to do the work myself in the first place—but by that point, of course, I was so deep in the hole it seemed easier to just keep digging.

I had to ask: when the dust had settled, was it really all a net gain?

If I’m being honest with myself, what I was doing was often more theatrical than productive; lots of show, with not a lot to show for it.

But being honest with myself, it turns out, was actually a lot more difficult than it should have been.

Intellectually, I knew what I was doing was questionable. But emotionally, I loved it.

I loved using my agent like a guilty pleasure. I wanted to keep using it, any chance I got.

I could see I was trading away something valuable for something petty, like a kid blowing their allowance at a gumball machine.

But I still wanted what the machine had. It triggered something in my brain. I believed maybe if I used this thing enough, all the tiny little meaningless tasks would eventually add up to something important.

Sometimes I’d feel a compulsion to fire up Claude Code and have it work on something, even when I had nothing in mind to accomplish. At times, I’d even catch myself about to ask AI to help me skip past things in real life, it had become so habitual. Hey Claude, do these dishes for me.

I’m very familiar with that sort of compulsion, and I recognized it as soon as I stepped back: I wanted to play AI, like a video game. I craved more of that feeling; that dopamine hit of accomplishing things unbelievably fast. (Not unlike a video game, actually.)

All the work I had AI do for me could’ve been a fantastic learning opportunity. Instead, I mostly just traded my own potential growth for…a pile of junk, essentially. And I did it happily. Enthusiastically, even.

That’s when I started to wonder if AI was doing more for my feelings than for my actual productivity.

An assumption worth questioning

Maybe it’s never crossed your mind to question the idea that AI makes you more productive.

Maybe you’ve never thought to ask because you also felt as though the difference was obvious and undeniable.

Or, maybe you never thought to question the assumption because it just seems to be a widely-accepted truth in this industry.2

Regardless: this assumption has always surprised me a bit, because it seems to come almost entirely from either anecdotal self-reporting, or from the companies selling AI. Extremely few good-quality, quantitative studies or surveys have even attempted to check, objectively: does AI make you more productive as a software engineer?

Among such attempts, I’m aware of none that’s come back with an unqualified “yes.” Whatever gains LLMs might offer, they’re always situational, and always come with tradeoffs.

Muddying the waters further: few take a holistic view of productivity. Often, all they measure is how fast participants can complete a simple coding exercise, or a basic greenfield “hello world” app. But the more they zoom out to real-world scale, the less benefit they report.

A lot of the supposed productivity gains from LLM usage seem to rely on a questionable definition of productivity.

But before we get to all that: let’s look at those studies I mentioned.

I’ve compiled a list of the notable studies and surveys I’m aware of, and paraphrased their findings below.

Studies on the impact of LLMs on developer productivity

Early this year, a study by Anthropic itself found AI usage offered statistically insignificant benefits, in exchange for significant tradeoff in skills built on the job.3 Similar studies in other areas, like this one have noted the same effect; whatever speed LLMs might provide comes with a toll on cognition.
In a survey also from earlier this year, CEOs overwhelmingly admitted little to no correlation between AI adoption and company-wide productivity gains.
This 2026 report found that while AI increased the overall amount of work being done, it also lowered quality in several ways. Code that was eventually reworked or deleted increased nearly tenfold, while the likelihood of bugs and other incidents nearly tripled.4
A 2025 study found that using AI made developers feel 24% faster—but in reality, actually made them 19% slower.5
A late 2025 study (summarized here) found up to 40% gains—but only in low-complexity greenfield projects, and in exchange for less maintainable code. The gains vanished, or even went negative in existing codebases and/or higher complexity tasks, particularly when accounting for rework.
A 2024–2025 study found gains in individual productivity, but also found that they came with a shipping bottleneck, and a reduction in the stability of shipped code.
An MIT study from 2024 proclaimed a 10–20% boost, but only by defining productivity in terms of the absolute number of pull requests (PRs) opened—a highly questionable metric. (Even so, the study admits its findings were difficult to measure properly, and barely reached statistical significance.)6
Microsoft’s widely-cited 2023 study touting “meaningful boosts to productivity” actually only tested building a simple “hello, world” boilerplate app, nothing like real-world dev work. There are also several caveats about the quality of the work dropping as a side effect.
While not a study, the last few Stack Overflow developer surveys have shown AI usage increasing among developers, while trust in AI is actually going down. (In fact: more developers distrust AI now than trust it.) AI solutions that are “almost right, but not quite” was the #1 frustration cited by developers in the most recent survey, with a full two-thirds of developers saying they are spending more time fixing AI-generated code now than ever before.

The major takeaways

All of those studies take different approaches, but there are a few common threads in their findings I’d like to point out:

LLM productivity benefits are highly situational. LLMs excel at straightforward, time-consuming tasks, like boilerplate and greenfield projects. They also help less-experienced coders more than veterans. The further you go outside that sweet spot, the less benefit there is.
There’s a pronounced gap between perception and reality. This reaffirms my experience. LLM users feel like the tool is doing much more for them than it actually is, when measured objectively.
Even where the gains are real, they come at a cost. Most concerning: LLM usage inhibits cognition and understanding. When you outsource your chance to speak the language, you quickly develop cognitive debt.7 Studies also confirm a code quality drop when using LLMs.
Most studies so far have only measured productivity at the individual level, and in a vacuum. Measurement tends to begin and end at authoring code. Rarely, if ever, is a broader, more realistic view taken. But in the rare instance where it is, positive impacts tend to evaporate.

This last point might be the biggest takeaway, in my mind.

The more you take a big-picture, holistic view of productivity, the more gains from LLM usage shrink, or even go negative.

Now, in fairness: many of those studies took place prior to 2026, and there may be reason to believe LLMs have progressed enough now that some of the results might have changed—or if not, that they will, eventually.

Even if that’s the case, however: we need to be very careful how we’re defining productivity, and how we’re measuring it. Because I feel like we’ve accepted an incomplete, inadequate definition of the word, in order to fit LLMs inside it.

Defining productivity

For any purpose where measuring productivity matters, you can’t just look at speed or volume. These are short-term metrics, and good work is often done slowly, and in small increments, in order to last long-term.8

But it seems like we don’t care anymore.

In fact, it feels like we’re actively being told to stop caring about any idea of productivity we might have agreed on prior to the advent of LLMs—or at least, to adjust it.

We’re told to stop writing code by hand, not because our work wasn’t good enough, but simply because…it isn’t as fast.

The focus has quietly (or perhaps loudly) shifted from the end product to the process—or at the very least, to quantity over quality—which, we agreed not so long ago, is very much backwards.

Many leaders are now overlooking results in favor of rubber-stamping workflows—which strikes me as the modern equivalent of measuring productivity by lines of code, or time spent at your desk.

These measures create predictably perverse incentives: I can easily churn out 10,000 lines of meaningless code, or sit at a desk 12 hours a day without getting anything at all done. And as we’ve recently seen, Amazon workers can burn through staggering quantities of tokens on nothing productive at all.

Whether LLM code is as good as human code is partially load-bearing here. After all, if the machine can write code as well as humans (or even close to it), why not do that faster?

So let’s poke at that notion a little, before moving on. (Because ultimately, it doesn’t really matter much, if we’re not focused on the right things in the first place.)

Five reasons to question LLM code quality

The technology is moving quickly, and it’s conceivable that at some point, LLMs may catch up to or overtake humans in code quality. At this point, however, it’s fair to say that hasn’t happened yet, for a few reasons:

The studies above overwhelmingly point to a drop-off in quality and reliability, relative to human control groups. Maybe that changes in the future, but it seems to be the truth for now, at least.9
LLMs were trained on average code, and thus generally have average outputs. They’re likely to throw React/NextJS at every problem, even when that’s a terrible decision, just because that’s what most mediocre developers would do.10 Maybe the case could be made that LLMs elevate the average, but they’re definitely not better than a good developer with specialized knowledge.
LLM output is non-deterministic. While that may not matter in some cases, it still means you’re rolling the dice, to some degree. You either believe all implementations are effectively equal (which seems unreasonable), or you believe that matters.
Humans will inevitably have a more comprehensive understanding of the organization, the team, the problem space, the history, the users, and other things that might exist outside the codebase and beyond an LLM’s context window (possibly even when they’re provided).
If LLM code was better than human code, we could logically expect the people who care most about quality to be happiest about it. But they aren’t.

Think about it: if LLM code was reliably of higher quality than human code, then LLMs would be making software more accessible, more maintainable, more performant, more usable, and more reliable. And I promise you: the people who care about those things would be absolutely thrilled about that!

Accessibility advocates, open source maintainers, performance engineers, UX workers, reliability engineers, support teams, QA—we could expect them all to be elated, if LLMs were actually moving the metrics they care about.

But that’s pretty much the opposite of what’s happening.

Instead, everywhere I look, specialized craftspeople are overwhelmingly burned out from fighting a losing fight to get people to care—a fight that is, in many ways, harder now than ever.

I’ve lost track of how many exceptional developers I’ve heard admitting they feel like they’re being driven out of the industry because, essentially, they didn’t stop caring when everybody else seemed to.

If LLMs were actually writing such good code, we’d be seeing better software by now.

But we don’t. In fact, many people seem to agree it’s worse than ever. Across the board, the best software seems to have gotten worse. And while you could maybe argue bad software is better now than it would’ve been, it’s also proliferated tenfold.

Reductionism

Some might look at this the opposite way, and say it’s not that LLM code is so great; it’s that humans also make mistakes. We’re a low bar to clear.

That’s fair. We’ve all messed up. Most of us have taken prod down at one point or another. But I don’t think that’s a very good reason to blast out suboptimal code, for a couple of reasons:

Nobody treats human code with such indifference. I’ve never once, in over a decade of writing code, had anyone express such low expectations of me, or treat my mistakes with such blasé detachment (no matter how fast I made them). So this is an obvious double standard.11
Mistakes are how humans learn. When something goes wrong because of us, there’s a benefit; we discovered something about our codebase that made us wiser. We gained resilience. We leveled up. We probably helped other people learn along with us, too.

A junior who made a mistake is one step closer to being a senior; a junior who let an LLM make a mistake (and had the LLM fix it for them) has probably learned nothing.

Some might also argue the reduction in quality is worth the bump in speed, which I suppose may be reasonable in some cases (but certainly not all).

But never mind that.

Let’s set aside code quality for a minute; ignore all the points above, and assume code written by an LLM is always at least as good as human-authored code, if not better.

Even in that scenario, we still have a whole bunch of problems to deal with before we can actually consider what we’re doing productive.

Code has never been the bottleneck

Writing code generally isn’t what slows teams down, and has never really been the hard part of software engineering at all. I’ve never, in over a decade in tech, heard even the most jaded CTO cite “typing speed” as a major blocker.

The job is so much more than that. There’s endless judgment, communication, and discernment that goes into the work.

It’s evaluating different approaches and weighing tradeoffs. It’s talking to the right people on five different teams to make sure everyone’s in alignment. It’s figuring out if what you’re building is actually the right implementation of the right solution. It’s design. And no matter how fast you can churn out code, you can’t skip past that part.

Even if you take the rosy view that making the code part faster simply clears room for more of everything else: I doubt it actually works that way in practice. We can all only give so much in a day, and replacing one mentally taxing activity for another doesn’t actually raise that ceiling; it just shuffles where you’re spending your energy.

Besides: PRs still need to be reviewed, don’t they? (Please say they still need to be reviewed.)

If you’re opening PRs faster than anyone can read through them, you’re not increasing productivity; you’re clogging the bottleneck. That’s closer to sabotage than it is to productivity.

But just for the sake of argument, let’s say all your PRs are full of great code, they all coast through review quickly and seamlessly, and they all get merged into prod without issue.

Even then, the measure of your productivity can’t be taken yet. In fact, it’s really only beginning.

The cost of maintenance

You’re not done once you’ve merged the code and it’s running in production. You still have to maintain the code, fix any bugs that might pop up, manage updates, and all the other stuff that comes with ownership, now that it’s being tested at real-world production scale.

The more you push out, the more you have to maintain. The more you add, the more complex your software becomes—which in turn, remember, lowers the effectiveness of LLMs, and makes adding new features harder. Inevitably, this means more and more of your time is spent working on and around code you’ve already “finished,” as it boomerangs its way back to you.

This effect isn’t unique to LLM usage, of course, but it’s more acute. The more code you merge, the more little zombie tickets start punching their way out of the ground to shamble back for another bite of your brain—and the less you’ll actually understand what’s going wrong and why, if you outsourced your mental grasp of the system. So you’re more likely to cause future trip-ups, and the cycle accelerates.

But whether or not you’re using an LLM, this still happens even if the code itself is great. Even excellent code adds overhead. So shipping out amazing PRs at record pace, even in a best-case scenario, offers only diminishing returns on overall productivity, because the more work you do, the more work you have to keep doing.

No matter how fast you can build something, maintaining it is still an ongoing cost.

Just because the floor was raised, it doesn’t mean the ceiling disappeared.

This is probably why so many vibe-coded apps are abandoned nearly as soon as they’re built: building is fun, and practically free (for now); maintenance is a slog, even with agents doing it.

It’s probably also why, even though it’s trivial to slop-fork pretty much anything you want, most people don’t seem to be doing it: because the moment you do, you’re on the hook for all the maintenance and updates.

Is your team prepared to shepherd the code, if it proliferates by an order of magnitude beyond its current scale? Are you planning for maintenance and complexity to increase in proportion with the throughput?

What bugs and unforeseen side effects are hiding in the code that you haven’t found yet? What happens if (when) those grow exponentially along with output?

Are you accounting for all of that in your view of productivity?

Or are you just looking at how fast it got built the first time?

Speeding in the wrong direction

Let’s say your LLM authored great code, super fast. It slid through review and was merged into prod successfully. It’s out there in the wild, and it’s so good, you don’t even have to touch it. It’s bug-free and pristine.

Great! You still might have built the wrong thing.

No matter how fast you’re able to ship code, it doesn’t make what you build intuitive, cohesive, or even useful in the first place. It could still be wrong for your organization, or your app, or your users. It might look great on paper, but fall apart in the real world. Or, it might even be a good idea, with a bad implementation.

We’ve seen an exponential explosion in the amount of software created over the past few years, but outside of AI itself, it doesn’t seem like much has changed for most people.12 We’re building way more things than ever before, but with very few exceptions, nobody really seems to be using them.

I have a theory why this might be:

Building things got cheap, but building the right thing didn’t get any easier.

Even perfect code can still make bad products. And if all you did was build the wrong thing fast, could you really call that being productive?

Maybe. Maybe if you’re just using AI to launch things at the wall as fast as possible to see what sticks, you could call faster failure productive. But even then, it’s worth asking: how long is that approach actually a net gain?

How many fast, cheap iteration cycles will you burn through before it would’ve been more efficient to just do it slowly and methodically from the start? Do you know what that number is?

But this is all assuming AI actually speeds up the iteration loop in the first place, and there’s good reason to question even that.

The iteration loop

While the idea that AI lets you iterate more quickly seems like a sound theory on paper, it might not always work out that way in practice.

Yes, you’re moving faster before the point you realize it’s not worth going further and you move on to the next iteration…but that point moves later in the process.

Previously, you might’ve stopped before taking on too much—say, before implementing the login system—because you would’ve paused to consider whether it was worthwhile to invest so much time and effort. You might have decided it wasn’t. Every addition comes with a cost, and so you get a natural chance to reevaluate, and potentially discover when things aren’t working and it’s time to try something new.

But when using AI, because that investment is far more trivial, instead of bailing out, you’ll likely just keep going a lot longer than you would have otherwise—maybe even past the point where the initial gains are wiped out.

There’s a good chance you won’t actually iterate faster using AI; you might just build much more fleshed-out failures, in roughly the same amount of time.

All the points above are why shipping some code is only the beginning, when it comes to measuring your actual productivity; your work is just barely entering the crucible. It’s untested. So gauging your productivity at that point is at best premature, and at worst, an entirely inaccurate measure that threatens to produce dangerously bad signals for your organization.

How do you know you can trust yourself?

Remember the study above, where engineers said using AI made them 24% faster, when in reality, it made them 19% slower?

Here’s what I need everyone reading this to internalize: you’re not any different from those engineers. I’m not either. (See this post’s intro.) None of us are.

None of those engineers knew there was a 43-percentage-point gap between their perception and actual reality. They all sincerely believed AI was helping them, not hurting.

And if you haven’t actually measured, quantitatively and objectively, you don’t know what your gap is, either—no matter how much your brain firmly insists you do. (And your brain is, most likely, insisting very hard right now.)

Here’s another statistic: in the 2026 State of AI survey, 64% of developers said AI tools have made them significantly more productive.

However, in that same survey, 68% of developers said AI reliance makes developers less skilled.

It’s easy for us to see the negative effects of something when we’re looking at other people. It’s much harder to see them when we’re evaluating ourselves.

LLMs are made by for-profit companies to be addicting.

One of the most effective ways they do this is by making you feel productive, even when you’re not.

Crucially: consciously knowing this does not change your susceptibility, any more than knowing you’re a heroin addict makes you immune to opioids. That’s why objective, quantitative measurement, with a holistic definition of productivity, is so important.

You probably won’t even notice all the creeping technical and cognitive debt as it weighs you down, because by that point, you’re most likely not thinking of it in those terms.

It’s so difficult to spot the downsides of LLM usage because we’re psychologically inclined to feel that initial positive burst, and to ignore the dozens of tiny paper cuts that follow—even when they’ve bled the original gains away, drip by drip.

But even if you do see it happening, notice your incentives are all pointing in the wrong direction by that point. Now that parsing the code is much harder than it would’ve been before (because you wrote none of it), and now that maintenance is harder (because you have much more code to maintain and work around), sunk cost pushes you further down the path of least resistance.

Faced with the decision to go back and do things a better way, or just press the button one more time to apply another layer of patch code you never read and don’t understand, all while staring down an ever-increasing backlog, all the inertia is pushing you further down the same path that got you here.

Worse: there’s now a pressure to keep the agent busy. It feels like you’re wasting time if you don’t. So the priority shifts from building the right thing, the right way, to simply burning all the tokens you can.

Compounding all of this: the less you actually take the time to read what the LLM is producing, the more you’ll be tempted to trust it unquestioningly—and the less you’ll be able to detect when it’s doing something wrong.

The less you understand, the more you trust AI. But the more you trust AI, the less you understand.

Throughout this whole process, however, you’ll probably still feel incredibly productive—even when the data would suggest you’re lying to yourself—because it’s tough to tell the difference between being busy and being productive if you never take a step back and measure.

But you probably don’t do that, if you’re in this deep. Because you probably trust how you feel too much to believe reality could possibly contradict you.

It’s much easier to just fudge the definition a little bit; I’ve generated a lot of code, so I’m productive. My agent’s always working, so I’m productive.

I’m stressed and busy, so I must be productive.

Despite all of this: I still believe AI agents can be massively useful tools for software development, when wielded carefully, in appropriate situations.

The problem is: they’re not shaped for judicious application (and they’re most certainly not marketed that way); they’re shaped to maximize your usage. And they work exceptionally well.

Productive for whom?

There’s one more aspect to productivity I’ve alluded to throughout this post, but have yet to address directly: what about the parts of the work that are productive for you, personally?

Sure, shipping things for your company is well and good. That’s what they pay you for, after all. But there’s an implicit (if not explicit) part of the deal where you get knowledge and skills, too.

You’re meant to learn from your work. The company is supposed to be training you and helping you grow, just as you’re supposed to be training those below you and helping the company grow.

Once you’ve been a junior long enough, if things are working the way they should, you’ll have built up enough skill and knowledge to jump to the next level, where you’ll once again start low with the expectation of further growth and development.

That’s why many developers take on side projects that aren’t strictly work-related, profitable, or even practical; to learn! There’s massive intrinsic value in doing the work, even when nothing otherwise meaningful comes of it. (In fact, doing is one of the only ways it’s even possible for humans to learn and grow.)

You’re supposed to get more than just pay from your job. It’s supposed to come with the promise of growth and potential advancement.

What happens to that system, when nobody’s actually gaining understanding from their work anymore? When there’s nobody in the building who actually knows how it all works, and everyone’s just trusting whatever an agent says?

What happens when nobody can tell whether the agent’s right?

Where will senior engineers come from, when juniors haven’t learned a single thing on the job in the last three years?

Sure, making juniors better and raising the overall average is great—on paper, at least—but we still want more than average, don’t we?

For whom is this all productive, exactly?

Not for me, if I’m not advancing and honing my skills. Certainly not for the juniors below me.

Not for your company, if it’s severing its own developer pipeline, and all its engineering output is essentially a regression to the mean.

If all we’re doing is using the same AI tools every other company’s also using, our software is nothing special under the hood. In that case, we’d need to lean heavily on other differentiators; marketing, design, customer service, etc.—

—But we’re not. We’re outsourcing all of those to AI, too.

Why is anybody going to care about the company when everything about it is exactly the same homogenous AI output every other company has?

The main winners in a gold rush are the ones selling pickaxes, and it sure seems to me like the token vendors are about the only ones who really stand to gain from most of this, if we’re adhering to any coherent, holistic definition of productivity.

But remember: LLM costs are heavily subsidized right now, and the prices won’t (can’t) stay this way. (And this is all to say nothing of other LLM-specific risks, like prompt injection, model collapse, and outages, just to name a few.) If and when that whole bubble bursts and token costs skyrocket: will you still have any idea what’s going on in your codebase?

Will your skills still be valuable, in a scenario where agents cost 10–20 times what they do now?

I’ve strayed into prediction. But no matter what the future holds: be careful how you measure your own productivity—and be sure you are measuring it. You can’t trust yourself if you don’t.

Most of all, though: be careful what you’re trading for it.

Epilogue

This isn’t really the conversation I want to be having.

Whether or not AI actually works as advertised isn’t really the right question in the first place.

I’ve written previously about many of my other concerns with AI (mainly: that it stands to profit extremely few already obscenely privileged people, at the great expense of basically all the rest of society). But it seems my industry has broadly disregarded all those other concerns, in favor of chasing the siren song of productivity.

So while calling out all of those other, far more important issues remains imperative: it’s clear what matters most to many of the decision makers in my corners of the world. And so I will poke my holes in the shaky premise they believe in, if it might help to steer the conversation back to where it ought to be.