Dark Code: The Silent Rot AI Accelerated
Your codebase is full of code that no one understands, no one owns, and no one can safely change. AI did not create this problem. It poured gasoline on a fire that was already burning.
Every engineering team has a number they track. Velocity. Sprint points. Deployment frequency. But almost no one tracks the one metric that predicts whether the codebase will still be maintainable in two years: how much of the code has gone dark. Still executing in production, passing CI, shipping to users, and understood by no one currently on the team.
The data tells a consistent story. Analyses of hundreds of millions of lines of code show that AI coding assistants have accelerated the rate at which codebases fill with duplicated, unreviewed, insecure, and incomprehensible code. The compounding effects are already measurable.
Your Codebase Is Rotting Faster Than You Think
Four Times the Clones, One-Tenth the Refactoring
In early 2023, GitClear published an analysis of over 150 million lines of code, documenting the first measurable signs that AI coding assistants were applying downward pressure on code quality metrics1. The trends were concerning but still emerging. Developers were accepting more AI-generated suggestions, and the ratio of moved and updated code (the structural maintenance that keeps a codebase healthy) was beginning to decline. At the time, the shift looked like a trade-off. Teams were moving faster, and if the code was a little rougher around the edges, that seemed like an acceptable price.
Two years later, the picture got worse. GitClear’s 2025 follow-up, covering 211 million lines across thousands of repositories, found that the trends did not stabilize. They accelerated2. What looked like a temporary dip in 2023 turned out to be the start of a sustained decline in code maintenance.
Copy-pasted code rose from 8.3% to 12.3% of all code changes, a fourfold increase in clone density that coincides precisely with the widespread adoption of AI coding tools2. This is the predictable outcome of tools that make generating new code nearly effortless while offering no help with the harder work of integrating, consolidating, and maintaining what already exists.
But the cloning epidemic is only half the story. The other half is what stopped happening.
Refactoring (restructuring existing code without changing its behavior) collapsed from 25% of all code changes to less than 10%2. That is not a rounding error. Developers are generating more new code than ever before while performing a fraction of the structural maintenance required to keep that code coherent. When it is faster to generate a new function than to find and improve the existing one, the rational choice, at least in the short term, is to generate. And so the clones multiply.
When you quadruple the rate at which duplicated code enters a codebase while cutting structural maintenance by more than half, you are not building software. You are accumulating it. Every cloned function is a future divergence point where a bug fix in one copy will not propagate to the others. Every skipped refactoring is a load-bearing wall you chose not to inspect. The codebase grows, but its coherence does not grow with it. Eventually the gap between volume and understanding becomes the defining characteristic of the system.
One in Four Lines of AI Code Ships with Exploitable Vulnerabilities
If code duplication were the only cost, teams might absorb it. Clones are ugly but not inherently dangerous. The problem extends past structural messiness into something with real operational risk: security.
An empirical analysis of AI-assisted code generation found that AI-generated code contains 23.7% more security vulnerabilities than human-written code3. That alone would be a serious finding. What makes it systemic is the human behavior surrounding it: 89% of junior developers accepted AI-generated code without meaningful review, and 76% of all developers surveyed reported bypassing security review entirely when using AI tools3. The code is less secure, and the people receiving it are less likely to notice.
Independent research confirms the scale. AppSec Santa’s 2026 analysis found that 25.1% of AI-generated code contains exploitable security vulnerabilities4. Not theoretical weaknesses or lint warnings, but flaws that an attacker can leverage in a running system. One in four lines of AI-assisted code ships with a door left open. And unlike code clones, which degrade maintainability gradually, a single exploitable vulnerability can compromise an entire system overnight.
The window to find and fix these vulnerabilities is shrinking at the same time the volume is growing. Mondoo’s 2026 State of Vulnerabilities report documented 48,175 CVEs in 2025 alone, with 192,742 malicious packages detected in the wild5. The time from vulnerability disclosure to active exploitation has collapsed to just five days5. That is not a window for a patch cycle. That is barely enough time to triage, let alone remediate, and it assumes your team even knows the vulnerable code exists.
AI tools generate code with measurably higher vulnerability density. Developers accept it with less scrutiny. Attackers exploit it faster than ever. And human review, the one thing that could break the cycle, is exactly what AI tools are eroding.
The Comprehension Crisis No One Is Talking About
The damage to code quality and security is visible if you look at the data. But there is a quieter effect that gets almost no attention: AI coding assistants are degrading developers’ ability to understand the code they are responsible for.
Anthropic published a randomized controlled trial with 52 professional software engineers. Developers who used AI assistance scored 17% lower on code comprehension assessments than those who worked without it6. The AI made them faster. It also made them less capable of understanding what they had built.
When developers do not understand the code they maintain, everything downstream suffers. Reviews become superficial. Bug fixes introduce new bugs. Refactoring becomes too risky to attempt. The code does not stop working. It stops being understood. And that is the moment it goes dark.
Gerard Holzmann’s “First Law of Software Development” provides the structural backdrop: codebases grow exponentially regardless of team intent7. More code means more surface area that no one fully understands. AI tools have not changed this law. They have accelerated it, while simultaneously reducing the comprehension capacity of the people responsible for managing the growth.
Dark code is not dead code. Dead code can be deleted. Dark code is alive. It handles requests, processes transactions, enforces business rules. The danger is not that it fails. The danger is that when it eventually does fail, no one will know why, and no one will know how to fix it without breaking something else.
Technical debt is compounding. Literally.
The accumulation of dark code is not linear. A recent quantification study found that each standard deviation increase in technical debt correlates with a 31% increase in defect density8. Worse, the compounding rate is approximately 14% per quarter when the debt is left unaddressed8.
A codebase that accumulates dark code today does not carry a static cost. The cost grows, quarter over quarter, sprint over sprint, because each uncomprehended module makes the next bug harder to find, each unrefactored clone makes the next feature harder to build safely, and each unreviewed vulnerability makes the next exploit more likely to succeed.
The teams that ignore this compounding are not saving time. They are borrowing it, at a rate that no velocity metric will capture until the bill comes due.
What the data demands, and what almost no team currently has, is a systematic framework for measuring how much of a codebase has gone dark, and a discipline for illuminating it before the compounding becomes irreversible.
The Dark Code Spectrum: a framework for what you cannot see
Naming the Dimensions of Darkness
Dark code is not a single problem. It is five overlapping failures that reinforce each other. Measuring only one dimension gives a false sense of safety. A team that tracks duplication but ignores ownership is measuring the shadow while missing the object casting it. A team that monitors vulnerabilities but not comprehension is patching holes in a wall it cannot see.
The Dark Code Spectrum is built from the research cited throughout this essay. Each dimension is measurable, each has severity signals drawn from empirical data, and each interacts with the others in ways that make isolated fixes almost impossible.
| Dimension | What It Measures | Indicator | Severity Signal |
|---|---|---|---|
| Clone Density | Proportion of codebase that is duplicated or copy-pasted, creating parallel paths that diverge silently | Copy/paste ratio in code changes; clone-to-unique ratio | Copy-pasted code rose from 8.3% to 12.3%, a 4x growth since AI assistants went mainstream2 |
| Ownership Vacuum | Code with no identifiable author who currently understands it and is accountable for its production behavior | Files with no active committer in 12 months; core developer turnover rate | 59.7% of open-source projects experience 30%+ annual core developer turnover9 |
| Comprehension Decay | Declining ability of current team members to read, explain, and debug the code they are responsible for | Comprehension assessment scores; time-to-understand for new team members | AI-assisted developers score 17% lower on comprehension assessments6 |
| Refactoring Deficit | Structural maintenance falling behind new code generation, leaving architecture brittle and resistant to change | Refactoring-to-new-code ratio; percentage of changes that are structural improvements | Refactoring collapsed from 25% to less than 10% of all code changes2 |
| Vulnerability Surface | Security exposure from code never properly reviewed, including AI-generated code accepted without analysis | Vulnerability density per KLOC; CVE-to-code-change ratio; time-to-exploit | 23.7% more vulnerabilities in AI code3; 48,175 CVEs in 2025 with a 5-day exploit window5 |
The framework is not exhaustive. Test coverage, documentation currency, build stability all matter too. But these five dimensions share a property that makes them particularly dangerous: they are invisible to the metrics most teams actually track. Velocity goes up. Sprint points get burned. Features ship. And behind the dashboard, the code goes dark along every one of these axes simultaneously.
These five dimensions are not independent. They form a system where each failure amplifies the others. Clone Density feeds Ownership Vacuum because duplicated code spreads responsibility across more files than any single developer can monitor. Ownership Vacuum drives Comprehension Decay because code without an active owner is code that no one is incentivized to understand. Comprehension Decay deepens the Refactoring Deficit because developers will not restructure code they cannot explain. And the Refactoring Deficit expands the Vulnerability Surface because unrefactored code accumulates the kind of structural debt that hides security flaws.
Why Every Dimension Feeds the Others
The reinforcing nature of these five dimensions is easier to see through a concrete scenario.
Consider a payment processing module written eighteen months ago by a senior engineer who has since left the company. The module works. It passes every test. It processes thousands of transactions daily without incident. But the developer who understood its concurrency model, its retry logic, and its edge cases around partial refunds is gone. This is the Ownership Vacuum. Not a theoretical risk, but a specific, measurable gap in the team’s ability to maintain a critical system.
The replacement team inherits the module. They need to add support for a new payment provider. Before they can extend the code, they must first understand it, reverse-engineering the original author’s design decisions from the code itself. Research on turnover-induced knowledge loss documents exactly this pattern: remaining developers face significant productivity costs reconstructing departed colleagues’ mental models from source code alone10. The time this reverse-engineering consumes is Comprehension Decay in action.
Faced with a module they partially understand and a deadline that does not accommodate a full rewrite, the team makes the rational short-term decision: they do not refactor the existing code. They copy the payment integration pattern, modify it for the new provider, and ship it alongside the original. This is the Refactoring Deficit becoming Clone Density. The code duplicates not because the team is careless, but because refactoring code you do not fully understand carries unacceptable risk.
Now there are two parallel implementations of payment processing logic. When a vulnerability is discovered in the shared pattern (say, an insufficient validation of webhook signatures) the fix must be applied in both places. But the team only knows about one, because the clone was never registered in any tracking system. The Vulnerability Surface has expanded, silently, and it will remain expanded until someone discovers the duplicate by accident or by breach.
The empirical data confirms what the scenario illustrates. Studies of bug resolution times show that the original author of a piece of code resolves bugs 1.71 times faster than a developer encountering it for the first time11. Ownership is a measurable predictor of how quickly and safely defects are resolved. When ownership is lost, every other dimension of the Dark Code Spectrum deteriorates.
I have seen this cycle repeat across seventeen years of building and maintaining production systems. The specifics change. Sometimes it is a payment module, sometimes a deployment pipeline, sometimes a machine learning inference service. But the pattern is invariant. A departure creates a vacuum. The vacuum degrades comprehension. Degraded comprehension prevents refactoring. The absence of refactoring breeds clones. And clones expand the surface area that no one fully audits for security. Each dimension feeds the next, and the system darkens from the inside out.
Illumination Is a Practice, Not a Tool Purchase
Ownership Is the First Line of Defense Against Dark Code
The instinct is to reach for a tool. A better linter. A smarter code review bot. An AI vulnerability scanner. These are fine, but they share a limitation: they operate on code that has already been written. The most effective defense against dark code is making sure every line of production code has an owner who understands it and is accountable for its behavior.
As noted in the feedback loop analysis above, the original author of a piece of code resolves bugs 1.71 times faster than a developer encountering it for the first time11. That is the difference between a two-hour fix and a four-hour investigation that might introduce new defects. Ownership is not process overhead. It is the single largest multiplier on remediation speed a team can deploy.
The counterargument is that AI tools make ownership less important by making everyone equally capable of understanding any code. The evidence says otherwise. An evaluation of 50 AI developer tools found that most do not measurably improve team velocity, and some make teams 19% slower12. The tools that fail share a trait: they are deployed into environments where no one owns the code the tools are operating on. AI assistance without ownership is automation without accountability. The tool generates a fix. No one on the team can verify whether it is correct. The fix ships, and the code grows a little darker.
Ownership means someone can answer three questions about every module in production: what does it do, why does it exist, and what breaks if it changes. When those questions have answers, reviews are substantive, refactoring is safe, and monitoring is focused. When they do not, everything else degrades.
Refactor Deliberately or Drown in Clones
If ownership is the first defense, disciplined refactoring is the second. Refactoring has fallen from 25% to less than 10% of all code changes. But the solution is not simply to do more refactoring. It is to do the right kind.
Research on refactored code reveals something counterintuitive: single-type refactoring operations, the kind where a tool renames a variable, extracts a method, or inlines a function in isolation, are three times more bug-prone than composite refactoring, where multiple interdependent changes are made as a coordinated restructuring13. This is exactly the kind of refactoring AI tools suggest: small, isolated, easy to accept with a single click. And exactly the kind most likely to introduce new defects.
AI-assisted refactoring may be worse than no refactoring at all if it encourages developers to accept piecemeal changes without understanding the broader structural context. Genuine refactoring, the kind that reduces dark code, requires understanding the system well enough to make coordinated changes across multiple files and abstractions. A tool that sees one function at a time cannot do that.
The economics of delay make this worse. Research on bug fix rates shows a negative correlation between the time a bug remains unfixed and the cost of fixing it14. Bugs that sit in unrefactored code do not stay the same size. They accumulate dependencies, become entangled with new features, and eventually reach a state where fixing them is indistinguishable from rewriting the module. The same compounding pattern that drives technical debt and dark code carries costs forward quarter over quarter: every quarter of deferred refactoring makes the next quarter’s refactoring more expensive and more dangerous.
Teams that treat refactoring as a first-class engineering activity, with dedicated time and the same rigor applied to feature work, build codebases that resist darkening. Teams that squeeze refactoring into the gaps between sprints, or outsource it to AI-suggested quick fixes, end up drowning in clones. A robust CI/CD pipeline built on tested patterns can enforce refactoring discipline by catching structural regressions before they reach production, but only if the humans behind the pipeline understand what they are protecting.
Your Supply Chain Is Someone Else’s Dark Code
The Dark Code Spectrum does not stop at the boundary of your repository. Every third-party dependency your application imports is someone else’s codebase, with its own ownership vacuums, comprehension gaps, refactoring deficits, and vulnerability surfaces. When you run npm install or pip install, you are not just adding functionality. You are inheriting the maintenance practices and security posture of every contributor to that dependency and its transitive dependencies.
The scale of this exposure is growing. Supply chain attack statistics for 2025 show that third-party breaches doubled to 30% of all reported breaches15. Attackers have realized that compromising a single widely-used package is more efficient than targeting individual applications. The dependency graph has become the attack surface, and most teams have no visibility into the dark code they are importing.
This reframes supply chain security as a dark code problem rather than a procurement problem. The question is not whether your vendor passed a security audit. The question is whether anyone, at any point in the dependency chain, understands the code that is executing in your production environment. If the answer is no, then your supply chain is dark, and no amount of scanning will illuminate what no human has ever read.
Software Was Never Meant to Be Disposable
AI did not create dark code. It removed the guardrails.
It would be convenient to frame AI coding assistants as the villain here. But dark code existed decades before the first language model generated a line of Python. Every codebase older than its current team has it: modules written by people who left, integrations built for requirements no one remembers, error handling paths that were never tested because no one could reproduce the conditions they guard against. The problem is not new. The pace is.
The Defense Innovation Board’s report to the United States Congress said it plainly in the title: “Software Is Never Done.”16 Software must be treated as a living system requiring continuous maintenance, not as a deliverable that ships once and is forgotten. The moment an organization treats software as complete, that software begins to die, not by crashing, but by becoming incomprehensible to the people who will inevitably need to change it.
AI tools did not violate this principle. They removed the natural friction that once enforced it. Before AI coding assistants, writing code was slow. That slowness was costly, but it was also a forcing function: if generating a new module took a full afternoon, you had a strong incentive to search for an existing one first, to understand what was already there, to refactor rather than duplicate. The time cost of writing code was, in retrospect, a guardrail against the overproduction of code no one understood. AI removed that guardrail entirely. The speed is genuine. The comprehension it displaces is the cost no one is accounting for.
After many years building and maintaining production systems. The pattern I have seen more than any other is the quiet inheritance and not the dramatic failure: you take ownership of a system where the documentation is the code itself, the original authors are three companies removed, and the only thing keeping it running is inertia and the fact that no one has been brave enough to change it. Those systems are not broken. They are dark. And every team I have worked with has at least one of them. If you have read The Beauty Index, you know I believe code should be written to be read, not merely to be executed. Dark code is the antithesis of that conviction: code written to run, with no consideration for the human who will one day need to understand it.
The acceleration that AI provides is real, and rejecting it entirely is not the answer. But using it without measurement, without ownership, without the deliberate practice of comprehension, is how a codebase goes from productive to dark in a single quarter. AI amplifies whatever process it is applied to. A well-maintained codebase with strong ownership gets more productive. A codebase that is already fraying rots faster.
Measure It, Own It, Illuminate It
Your codebase has dark code. The question is whether you are measuring how much, and whether that measurement is driving action.
The Dark Code Spectrum provides the diagnostic. What follows is the discipline.
Start by measuring. Apply the five dimensions to your codebase, not once, not as a quarterly exercise, but as a continuous signal alongside velocity and deployment frequency. The metrics you ignore are the ones that define your next incident. A responsible approach to AI-assisted development includes measuring downstream effects, not just upstream speed.
Then own it. Every module in production should have someone who can answer: what does it do, why does it exist, and what breaks if it changes. When those answers live in someone’s head rather than in documentation, they are one resignation away from disappearing. Ownership is the difference between code that can be maintained and code that can only be replaced.
Then illuminate it. Instrument the code paths no one monitors. Review the modules no one has touched in twelve months. Test the error handling written for conditions no one has reproduced. This is not a one-time audit. It is a continuous effort to push back against the tendency of complex systems to accumulate darkness.
Code clones are multiplying. Refactoring is collapsing. Comprehension is declining. Vulnerabilities are compounding. Every one of these trends is measurable, and every one is reversible, but only through deliberate action. The codebase does not go dark by accident. It goes dark by default, one unread module at a time, one skipped review at a time, one accepted suggestion at a time.