AdSense: Mobile Banner (300x50)
Artificial Intelligence 7 min read

How Anthropic Fixed Claude AI Shrinkflation Bugs

Anthropic reveals three software bugs caused Claude AI shrinkflation. Read how caching and prompt errors impacted US developers and how it was fully fixed

F
FinTech Grid Staff Writer
How Anthropic Fixed Claude AI Shrinkflation Bugs
Image representative for How Anthropic Fixed Claude AI Shrinkflation Bugs

Mystery Solved: Anthropic Reveals the Technical Glitches Behind Claude's "AI Shrinkflation"

For several weeks, a growing and highly vocal chorus of developers, software engineers, and enterprise AI power users across the United States have been sounding the alarm: Anthropic’s flagship generative AI models appeared to be losing their edge. Across platforms like GitHub, X (formerly Twitter), and Reddit, the American tech community reported a frustrating phenomenon that was quickly dubbed "AI shrinkflation." This term described a perceived, silent degradation in model quality where Claude seemed dramatically less capable of sustained, complex reasoning. Users noted that the AI was becoming increasingly prone to hallucinations, losing context in long prompts, and acting remarkably wasteful with paid tokens. Critics pointed to a measurable and highly disruptive shift in the model's fundamental behavior. They alleged that Claude had abandoned its meticulous, "research-first" approach to problem-solving in favor of a lazier, "edit-first" style that simply could no longer be trusted for rigorous, enterprise-grade software engineering.

Initially, Anthropic pushed back against the rising tide of complaints. The San Francisco-based AI research company firmly denied rumors that they were intentionally "nerfing" or throttling the model to manage surging server demand and compute costs. However, as mounting empirical evidence from high-profile users and third-party testing agencies flooded the internet, a significant trust gap began to form between the developer community and the AI provider.

Today, Anthropic addressed these mounting concerns head-on. The company published a highly detailed technical post-mortem report that identified three separate, seemingly minor product-layer changes that inadvertently combined to cause the widely reported quality issues.

"We take reports about degradation very seriously," Anthropic stated in their official engineering blog post. "We never intentionally degrade our models, and we were able to immediately confirm that our API and inference layer were unaffected."

According to the report, Anthropic has now fully resolved these crippling issues by reverting the problematic reasoning effort changes, removing restrictive verbosity prompts, and deploying a critical patch for a caching bug in version v2.1.116. Here is a comprehensive breakdown of what went wrong, the evidence that forced Anthropic’s hand, and how they plan to safeguard the developer experience moving forward.

The Mounting Evidence of AI Degradation

The controversy surrounding Claude's performance officially reached a boiling point in early April 2026. The narrative shifted from anecdotal frustration to hard data, fueled by exhaustive technical analyses from respected voices within the developer community.

Stella Laurenzo, a Senior Director in AMD’s AI group, became a central figure in this investigation. Laurenzo published a massive, highly detailed audit of 6,852 Claude Code session files and tracked over 234,000 tool calls on GitHub. Her exhaustive data proved conclusively that Claude's performance had fallen drastically compared to her historical usage metrics. Laurenzo's findings revealed that Claude’s reasoning depth had plummeted. Instead of systematically working through complex logic, the model frequently fell into endless reasoning loops and exhibited a strong, detrimental tendency to choose the "simplest fix" rather than the functionally correct one.

This grassroots developer frustration was subsequently validated by professional third-party benchmarks. BridgeMind, a leading AI performance evaluation firm, reported alarming statistics regarding the model's accuracy. According to their rigorous testing suite, Claude Opus 4.6’s accuracy had dropped precipitously from 83.3% down to 68.3%. This massive degradation caused the model's ranking on their internal leaderboards to plummet from the No. 2 spot all the way down to No. 10.

While some AI researchers cautioned that these specific benchmark comparisons might be slightly skewed due to inconsistent testing scopes, the overarching narrative that Claude had suddenly become "dumber" had already evolved into a viral talking point across Silicon Valley. Compounding the frustration, enterprise users reported that their usage limits and token allotments were draining significantly faster than expected, pouring gasoline on the suspicions that Anthropic was intentionally throttling performance to artificially manage scaling issues.

The Post-Mortem: Three Bugs That Broke the AI

In its transparent post-mortem blog post, Anthropic clarified a vital technical distinction: the underlying foundational model weights had never regressed. Instead, the degradation was caused by three specific, isolated changes to the "harness"—the surrounding software infrastructure and system prompts that guide how the model interacts with users. These well-intentioned updates inadvertently crippled the AI's performance.

1. Default Reasoning Effort Down-Scaling (March 4)

On March 4, in an attempt to optimize user experience, Anthropic engineers changed the default reasoning effort from high to medium specifically for Claude Code. The primary goal was to address UI latency issues and prevent the command-line interface from appearing "frozen" while the model processed complex thoughts. Unfortunately, trading compute time for UI speed resulted in a severe, highly noticeable drop in the model's baseline intelligence and its ability to execute multi-step logic for complex coding tasks.

2. A Critical Caching Logic Bug (March 26)

Shipped on March 26, this was perhaps the most damaging issue. Anthropic introduced a caching optimization designed to prune old "thinking" data from idle user sessions to save memory. However, this update contained a critical logic bug. The system was designed to clear the thinking history just once after an hour of inactivity. Instead, the bug caused the system to aggressively clear the thinking history on every single subsequent turn following that initial hour. This essentially gave the model localized amnesia, causing it to completely lose its "short-term memory." As a result, the AI became highly repetitive, forgetful, and incapable of maintaining context in long developer sessions.

3. System Prompt Verbosity Limits (April 16)

In an attempt to streamline outputs, Anthropic added strict instructions to the system prompt on April 16. The new parameters forced the model to keep text between tool calls under 25 words and final user responses under 100 words. This aggressive attempt to reduce verbosity in the newer Opus 4.7 model backfired spectacularly. By constraining the model's ability to "think out loud" and explain its coding logic, the update caused a measurable 3% drop in overall coding quality evaluations.

Collateral Damage and Platform Impact

The technical fallout from these three intersecting bugs was widespread, though Anthropic was quick to point out the boundaries of the damage. The quality issues severely impacted users relying on the Claude Code CLI, the Claude Agent SDK, and the enterprise-focused Claude Cowork platform. Fortunately, developers interacting directly with the raw Claude API were not impacted by these harness-level changes.

Anthropic leadership openly admitted that these compounding errors made their flagship model appear to have "less intelligence," a reality they acknowledged falls far below the premium standard that the US developer market demands and expects.

Rebuilding Trust: New Safeguards and Compensation

To repair the fractured trust with the engineering community and to guarantee that this type of silent regression does not happen again, Anthropic is rolling out a sweeping series of operational overhauls:

  1. Mandatory Internal Dogfooding: A significantly larger share of Anthropic's internal engineering and product staff will now be required to use the exact public builds of Claude Code for their daily tasks. This ensures internal teams experience the product exactly as end-users do, catching UI and logic bugs before they scale.
  2. Enhanced Evaluation Suites: The company is implementing a much broader, highly rigorous suite of per-model evaluations. Moving forward, they will run strict "ablations" for every single system prompt change to mathematically isolate the impact of specific instructions before they are pushed to production.
  3. Tighter Version Controls: New internal tooling has been developed to make prompt changes vastly easier to audit. Furthermore, model-specific changes will be strictly gated, ensuring that a fix meant for a UI wrapper doesn't inadvertently alter the foundational model's logic.
  4. Subscriber Compensation: Acknowledging the financial impact on their users, Anthropic is making developers whole. To account for the heavy token waste and performance friction caused by these bugs, the company has officially reset usage limits for all premium subscribers as of April 23.

Moving forward, Anthropic has pledged a new era of transparency. The company will actively utilize its newly launched @ClaudeDevs account on X and participate directly in GitHub threads to provide the community with deeper, technical reasoning behind future product decisions. For the thousands of developers relying on Claude to power the next generation of software, this transparency—and the return of Claude's full reasoning capabilities—cannot come soon enough.

Share on

Comments

No comments yet. Be the first to share your thoughts!

Leave a Comment

Max 2000 characters

Related Articles

Sponsored Content