On June 12, the AI research firm Anthropic ceased access to its newly launched Claude models, namely Fable 5 and Mythos 5, just three days after their debut.
This action was precipitated by an “export control directive” from the United States government, which restricts the deployment of these models to individuals solely within US borders.
Mythos, concomitantly Anthropic’s most formidable “frontier” model, was initially withheld from public release last April due to concerns over its potential for malicious hacking.
Instead, it was distributed to select organizations—primarily US tech firms—tasked with reinforcing critical digital infrastructures against vulnerabilities.
Conversely, Fable retains a similar foundational architecture but has been fortified with additional safeguards intended to mitigate cybersecurity exploitation. This model was released to the public last week, only to be rapidly curtailed.
Anthropic Contending with Political Opposition
Since early 2025, tensions have escalated between Anthropic and the Trump administration, which has accused the lab of promoting “woke AI” and labeled its CEO, Dario Amodei, an “ideological lunatic”.
Initial skirmishes revolved around AI regulatory frameworks and semiconductor export policies. The conflict intensified when Anthropic rejected Pentagon requests to leverage its models for domestic surveillance and fully autonomous weaponry.
In retaliation, the Department of Defense threatened to categorize Anthropic as a “supply chain risk,” a designation that would necessitate military contractors to sever ties with the firm.
Circumvention Concerns
While the US government has yet to elucidate the rationale behind last week’s directive, Anthropic believes it stemmed from the government’s awareness of a “jailbreak”—a method to elude the protective measures embedded in Fable that inhibit its usage for malevolent purposes.
These safety protocols assess user inquiries as either safe or unsafe before channeling them to the AI model, with unsafe requests being rerouted to a less potent variant.
The apprehension from the government, as articulated by Anthropic, concerns the potential for these safeguards to be circumvented, thereby facilitating access to information amenable to cyber intrusions.
However, the guardrails designed for large language models are not infallible, as they heavily rely on the model’s ability to accurately discern user intent.
Compounding this challenge, a vibrant online community—referred to by some as the “Undersphere”—is actively engaged in discovering ways to bypass AI restrictions. Acknowledging this reality, Anthropic admits that “perfect jailbreak resistance is not achievable for any current model provider”.
Moreover, Anthropic has suggested that the evidence prompting the government directive likely originated from engineers at Amazon, a competitor and significant investor.
Yet this was not the sole incident of potential circumvention. Within 48 hours of Fable’s launch, a researcher, adopting the alias “Pliny the Liberator,” published what they asserted to be Fable 5’s comprehensive system prompt on X and GitHub.
The system prompt consists of concealed directives that shape an AI model’s behavior; while the exact implications of disclosing Fable’s system prompt remain ambiguous, it has certainly garnered attention in the Undersphere.
An Unexpected Enigma
The core challenge in securing large language models like Fable lies in our limited understanding of their inner workings. According to Maximilian Kasy, an economist and machine learning specialist at Oxford University, these models exhibit performance that surpasses expectations.
With billions of internal parameters and trained on incomprehensibly large datasets using advanced machine learning techniques, one would typically anticipate these systems to be “overfitted”: adept at replicating the intricate patterns found in their training data yet ineffective at generalizing in novel circumstances.
Surprisingly, contemporary models such as Claude and ChatGPT appear adept at generalization. Kasy draws a parallel between modern AI advancements and alchemy: effective through iterative experimentation rather than underpinned by a robust theoretical framework.
As a result, the conduct of AI models remains, to a significant extent, inscrutable, even to their creators.
Challenges of Regulation
The opacity inherent in AI technology presents a formidable obstacle to regulation. Governments lack independent access to data, infrastructure, and expertise necessary for a thorough evaluation of proprietary frontier models.
Reflected in the US administration’s recent executive order on AI security, published a fortnight ago, this realization marks a transition from a previously hands-off approach to an insistence that developers submit their models for scrutiny prior to public release.
This stipulation implicitly acknowledges the administration’s lack of confidence in these companies to impartially assess the capabilities and potential misuse of their own models.
The public’s visibility is even further limited, as evidenced by a survey conducted across 25 nations last year, which revealed a significant trend: individuals are more than twice as perturbed by AI as they are enthusiastic about it.
The Future of AI Safety
While AI is undeniably a much-lauded technology, its potency and unpredictability are equally evident. This precarious combination makes it inherently perilous.

Dependence on regulations alone is misplaced, as technological advancements will invariably outpace regulatory adaptations. Similarly, reliance on guardrails is futile, given their susceptibility to circumvention.
What is imperative is the establishment of a governance framework capable of anticipating and addressing potential failures.
Such a framework must be global, participatory, and rooted in mutual trust—attributes that the current US administration has yet to convincingly demonstrate.
Source link: Indailysa.com.au.





