AI Safety is a Market Failure: Lessons from the Anthropic Disclosure

November 18, 2025

Authors

Bharath Reddy

The chilling reality of AI-assisted cyberattacks was recently exposed whenAnthropic disclosed that malicious actors used its Claude models to carry out a sophisticated attack at a scale and speed that is simply impossible for human hackers to match. These excerpts from the report indicate the significant capabilities of the attack.

At this point they had to convince Claude—which is extensively trained to avoid harmful behaviors—to engage in the attack. They did so by jailbreaking it, effectively tricking it to bypass its guardrails. They broke down their attacks into small, seemingly innocent tasks that Claude would execute without being provided the full context of their malicious purpose.

At the peak of its attack, the AI made thousands of requests, often multiple per second—an attack speed that would have been, for human hackers, simply impossible to match.

The barriers to performing sophisticated cyberattacks have dropped substantially—and we predict that they’ll continue to do so

Industry is poorly incentivised to disclose such incidents or conduct research into safety, ethics, and harms from its AI systems. However, companies at the forefront of building the technology are the best placed to have visibility into early trends, impact and risks.

The marketplace has historically punished such disclosures. If a company discloses a vulnerability or misuse of its product, it risks a dip in valuation, regulatory scrutiny, and reputational damage, while its silent competitors go unpunished.

Anthropic is an outlier among the big AI companies, distinguishing itself from its peers by prioritising ethics and safety. Its efforts include establishing a Constitutional AI framework and publishing research on AI explainability, usage trends, and economic impact. This also raises the question: if Anthropic were a listed company, would its incentives be different?

The report also articulates why these models remain the best defence against such attacks and why companies must continue to develop and release them.

.. if AI models can be misused for cyberattacks at this scale, why continue to develop and release them? The answer is that the very abilities that allow Claude to be used in these attacks also make it crucial for cyber defense.

… which makes industry threat sharing, improved detection methods, and stronger safety controls all the more critical.

India’s AI governance guidelines propose several measures to deal with such developments. These include voluntary frameworks and incident reporting guidelines that can help enhance trust and safety without compliance burdens that can constrain innovation.

The Committee believes that voluntary measures can serve as an important layer of risk mitigation in India’s AI governance framework. While not legally binding, they support norms development, create accountability, and inform future regulatory choices

The database should be set up in a way that encourages reporting cases without the threat of penalties, with the goal of identifying harms, assessing its impact, and mitigating harm through a multi-stakeholder approach.

It also includes the establishment of the AI Safety Institute , which comes with a mandate for research, risk assessment, and capacity-building. Additionally, the Technical and Policy Expert Committee (TPEC) is recommended to provide technical guidance to policymakers. However, for it to work, a realignment of expectations among policymakers, government agencies, and industry is needed, recognising that disclosure is beneficial for all stakeholders involved and should be encouraged.

Ethics and safety research have positive externalities, and the industry is not directly incentivised to invest in it. This is where AI safety institutes become vital in building trust and awareness.

Disclosure: The blog was copy-edited using Google Gemini.