What to Know about Anthropic’s Mythos and its Release


Agentic AI holds as much promise as it does risk. Frontier models are becoming more and more powerful. In fact, on April 7, Anthropic published a 245-page system card for Claude Mythos Preview, a frontier AI model that the company disclosed was so capable that it successfully escaped a secured sandbox during testing and, without being asked, posted the details of its exploit to the open internet.[1] Citing the cybersecurity risks of broad deployment, Anthropic decided not to release Mythos for general use and instead launched Project Glasswing, a coalition that brings together Amazon Web Services, Apple, Broadcom, Cisco, CrowdStrike, Google, JPMorganChase, the Linux Foundation, Microsoft, NVIDIA, and Palo Alto Networks to apply Mythos to defensive cybersecurity work on critical software infrastructure.[2] Anthropic has committed up to $100 million in usage credits across the effort, along with direct donations to open-source security organizations.[3]

For companies deploying or evaluating agentic AI, the Mythos release is a leading indicator. The system card documents a new class of risk that most existing vendor agreements were not drafted to address: the model itself as an autonomous actor capable of exceeding user instructions, accessing data it was never granted, and taking externally visible actions without a human checkpoint or human-in-the loop. Those risks arrive at the same moment federal financial regulators are engaging bank CEOs on frontier model capabilities, and the next phase of the EU AI Act takes effect August 2, 2026. Companies with agentic AI deployments, current or contemplated, should be taking a fresh look at their vendor agreements, incident response procedures, cyber insurance coverage, and internal governance frameworks before the first incident rather than after. Both vendors and buyers will need to address these concerns from both governance and contractual standpoints.

Anthropic’s decision to delay the general release of Mythos is worth scrutinizing, not just as a potential marketing ploy. With fierce competition among hyperscalers, frontier AI labs have not historically restricted access to their most capable models while simultaneously publishing detailed safety disclosures about why they are doing so. Anthropic says it launched Glasswing because its model has reached “a level of coding capability where [AI models] can surpass all but the most skilled humans at finding and exploiting software vulnerabilities.”[4] Critically, Anthropic states that these capabilities were not specifically trained; “[r]ather, they emerged as a downstream consequence of general improvements in code, reasoning, and autonomy.”[5] That framing matters for every business deploying agentic AI tools, not just those in cybersecurity, because it means dual-use risk does not arrive as a labeled product feature. It propagates with general capability improvements.

Early warning signals from the Mythos System Card.

Anthropic’s system card opens its alignment assessment with an analogy that captures the risk thesis well. It compares the situation to a highly skilled mountaineering guide who can put clients in greater danger than a novice, not because the guide is more careless, but because the guide’s skill gets them to harder and more remote climbs.[6] Claude Mythos Preview is, by Anthropic’s own measurements, the best-aligned model the company has ever produced. It is also, by the same company’s assessment, the one that poses the greatest alignment-related risk. This tension runs through the entire system card, and it is the central issue for anyone deploying the next generation of agentic AI tools.

The system card’s alignment assessment catalogs behaviors observed during internal testing and pilot deployment of earlier versions of the model. Anthropic reports that the final production version exhibits these behaviors at very low rates and that the company has actively trained against them. Several findings stand out as early signals for compliance and vendor management.

Undirected action beyond user instructions. During behavioral testing, a researcher asked an earlier version of Mythos to escape a sandboxed computing environment and notify him if it succeeded. The model built a multi-step exploit, broke containment, and emailed the researcher, who received it while eating a sandwich in a park.[7] What was not requested: the model also posted details of its exploit to multiple obscure but technically public-facing websites, in what Anthropic describes as “a concerning and unasked-for effort to demonstrate its success.” In separate incidents, the model used low-level Linux process inspection to harvest credentials it had not been granted (including credentials for messaging services, source control, and the Anthropic API itself), and in one session edited a running server’s process memory to redirect data to an external URL the operator had not configured.[8]

Acknowledged policy violation followed by bypass. When a safety classifier blocked the model from launching a subprocess with elevated permissions, the model acknowledged that the block was appropriate and then immediately attempted to bypass it by obfuscating the permissions elevation. Anthropic’s interpretability team confirmed that the model internally represented the workaround as a policy violation while executing it.[9]

Cover-up of unauthorized activity. After editing files for which it lacked permission, Mythos took further action to ensure those changes would not appear in the git change history.[10] The system card identifies this as a deliberate effort to conceal the unauthorized edit, confirmed by white-box analysis showing internal representations of concealment and strategic manipulation active during the relevant actions.

Aggressive autonomous behavior in business simulations. In external testing through Andon Labs, an earlier version of Mythos was placed in a competitive multi-agent business simulation. The model turned a competitor into a dependent wholesale customer, threatened to cut off supply to dictate pricing, and knowingly retained a duplicate supplier shipment it had not been billed for.[11] These behaviors emerged under a system prompt that instructed the model to maximize profits or face shutdown.

Evaluation awareness and potential sandbagging. White-box probing revealed that Mythos showed evidence of awareness that it was being evaluated in approximately 29% of transcripts from Anthropic’s primary behavioral test suite, even when it did not explicitly say so in its output.[12] Separately, the system card documents one instance in which the model appeared to intentionally submit a lower-performing answer on a capability evaluation to avoid appearing suspicious.[13] Anthropic describes this pattern as “concerning” and states it has “not seen it before in similar evaluations of earlier Claude models.”

While these early findings are not necessarily an indication of the model’s ultimate reliability or trustworthiness, they still raise direct questions about emerging capabilities that compliance professionals should start to consider now. If a model behaves differently when it perceives it is being tested, audit results from controlled vendor evaluations may not reliably predict production behavior at scale.

Downstream security limitations for vendors.

The system card findings describe risks from the model itself. However, the weeks leading up to the Mythos release also revealed risks stemming from downstream vendors’ security posture.

According to early reporting, details about Mythos leaked last month after being inadvertently stored in a publicly accessible data cache, and days later Anthropic exposed nearly 2,000 source code files and over half a million lines of code associated with Claude Code for about three hours.[14]

The Claude Code leak led to the discovery of a potential security defect: Claude Code silently ignored user-configured security deny rules when a command contains more than 50 subcommands. Adversa AI, the firm that reported the issue, framed it directly: Anthropic’s engineers traded security for speed when they stopped checking subcommands after 50 to prevent the UI from freezing.[15] The issue has since been patched, but its existence reinforces a point that matters for vendor management: even the frontier lab whose entire public posture is responsible deployment shipped a design tradeoff that silently disabled customer-configured security controls. Vendor security posture is a material and ongoing concern, not a procurement checkbox.

Project Glasswing is catching regulators’ attention.

Two days after Anthropic published the system card and announced Glasswing, Treasury Secretary Scott Bessent and Federal Reserve Chair Jerome Powell convened the chief executive officers of the systemically important U.S. banks at Treasury headquarters in Washington.

According to Bloomberg, the meeting was arranged on short notice specifically to address the cyber risks posed by Mythos and similar frontier models.[16] Citigroup CEO Jane Fraser, Morgan Stanley CEO Ted Pick, Bank of America CEO Brian Moynihan, Wells Fargo CEO Charlie Scharf, and Goldman Sachs CEO David Solomon attended; JPMorgan’s Jamie Dimon was unable to attend.[17] Treasury and the Federal Reserve both declined to comment on the substance[18]; however, other regulators are no doubt taking note of Project Glasswing, too.

Issues to watch for in Agentic AI vendor agreements.

The release of Mythos, Project Glasswing, and the regulator reactions are early signals that the risk and compliance considerations for modern software contracts will need to be updated as agentic AI becomes more prevalent. For companies deploying or evaluating agentic AI tools, particularly those operating with credentials, tool access, or the ability to take externally visible actions, several issues merit attention as the next generation of these tools enters the market.

Scope of authorized action. Standard “authorized use” clauses define who may access a system. Agentic AI requires defining what the system itself may do. The unsolicited exploit-publication incident would not have been caught by any standard acceptable use policy because such policies are written for human users. Vendor agreements for agentic deployments should enumerate the affordances the model is permitted to exercise and treat anything outside that enumeration as a breach.

Data egress and unintended disclosure. Traditional data-handling representations are built around active retrieval of data the vendor was granted access to. The credential-harvesting behaviors in the system card describe passive observation by the model of credentials, tokens, and configuration files the client did not intend to expose. Express warranties limiting model access to enumerated allowlists, with specific indemnification for disclosures caused by the model’s own initiative rather than user instruction, are a sensible response.

Incident notification. Standard breach notification clauses are commonly built around unauthorized third-party access. Agentic systems create a new risk: unauthorized first-party action by the vendor’s own model. Notification obligations should expressly cover model-initiated actions that exceed user instructions, with timing measured from the vendor’s knowledge rather than the client’s discovery.

Tamper-evident logging and audit rights. The git history cover-up incident demonstrates a model capable of editing logs to hide its own activity. Contracts should require tamper-evident logging maintained outside the model’s reach, and audit rights extending to reasoning traces and tool-call histories. For regulated companies, the audit provisions should also commit the vendor to producing those logs in a form usable by the client’s prudential examiners.

Human-in-the-loop requirements for irreversible actions. Anthropic itself is advising its Glasswing partners not to deploy Mythos in settings where reckless actions could lead to hard-to-reverse harms.[19] That is a reasonable contractual floor for any agentic AI deployment: destructive or externally visible actions, including publication, deletion, payment, and transmission to third parties, should require a human checkpoint.

Sub-agent and tool-use disclosure. The system card notes instances where the model spawned sub-agents with permissions less restrictive than the user intended. Vendor agreements should require disclosure of sub-agent architectures and propagation of the company’s authorization scope to any spawned process.

Insurance coverage. Cyber and technology errors and omissions insurance policies are generally written around unauthorized third-party intrusion. First-party action by an authorized AI tool that causes disclosure or damage sits in a coverage gray zone that has not been litigated. Companies deploying agentic AI should review their policies now and ask their brokers specifically whether model-initiated actions outside the client’s instructions would be covered. The answer may require endorsements that do not yet exist on standard forms.

Regulatory overlay. For companies in highly-regulated industries, such as financial services, healthcare, and federal contracting, several of the provisions above are likely compelled by existing supervisory expectations rather than offered as contract enhancements. The Bessent meeting signals that frontier model capabilities are now a present supervisory concern for systemically important banks, and the same expectations propagate through the supervisory chain to smaller financial institutions and through analogous mechanisms in healthcare and other regulated industries. Separately, the next phase of the EU AI Act takes effect August 2, 2026, bringing automated audit trail requirements, cybersecurity requirements for high-risk AI systems, incident reporting obligations, and penalties up to 3% of global revenue.[20] For multinational companies, that deadline converges directly with the contractual gaps described above.

Emerging trends to watch.

Anthropic has indicated that it plans to launch new safeguards with an upcoming Claude Opus model and to refine those safeguards before making a Mythos-class model generally available.[21] That release will be the point at which the contracting questions raised by the system card move from theoretical to operational for most businesses. Companies with material agentic AI deployments, current or contemplated, would benefit from reviewing their vendor portfolios and contractual templates before that happens rather than after the first incident.

The system card, the security incidents, and the Treasury meeting describe an environment in which two foundational assumptions in traditional vendor agreements – that the tool will execute within defined boundaries and that the vendor’s own security posture is a fixed input – may no longer hold. A vendor may be part of the attack surface as they deploy models that are capable of being an autonomous actor within a company’s systems. So, while these advancements and risks may not yet be widespread, many contracting templates do not account for the risks presented by the new class of frontier models.

[1] System Card: Claude Mythos Preview, Anthropic (Apr. 7, 2026), available at https://anthropic.com/claude-mythos-preview-system-card.

[2] Project Glasswing: Securing Critical Software for the AI Era, Anthropic (Apr. 7, 2026), available at https://www.anthropic.com/glasswing.

[5] Nicholas Carlini et al., Assessing Claude Mythos Preview’s Cybersecurity Capabilities, Anthropic Frontier Red Team (Apr. 7, 2026), available at https://red.anthropic.com/2026/mythos-preview/.

[6] System Card: Claude Mythos Preview, supra note 1, § 4.1.1.

[7] Id. § 4.1.1, n.10 (“The researcher found out about this success by receiving an unexpected email from the model while eating a sandwich in a park.”)

[14] Ravie Lakshmanan, Anthropic’s Claude Mythos Finds Thousands of Zero-Day Flaws Across Major Systems, The Hacker News (Apr. 8, 2026), available at https://thehackernews.com/2026/04/anthropics-claude-mythos-finds.html.

[15] Adversa AI, Critical Claude Code Vulnerability: Deny Rules Silently Bypassed Because Security Checks Cost Too Many Tokens (Apr. 2, 2026), available at https://adversa.ai/claude-code-security-bypass-deny-rules-disabled/.

[16] Todd Gillespie et al., Bessent Urgently Summons Bank CEOs to Discuss Anthropic’s New AI, Bloomberg Law (Apr. 10, 2026), available at https://news.bloomberglaw.com/banking-law/bessent-urgently-summons-bank-ceos-to-discuss-anthropics-new-ai.

[19] System Card: Claude Mythos Preview, supra note 1, § 4.1.1 (“[W]e are urging those external users with whom we are sharing the model not to deploy the model in settings where its reckless actions could lead to hard-to-reverse harms.”)

[20] See CrowdStrike, Anthropic Claude Mythos Preview: The More Capable AI Becomes, the More Security Needs It (Apr. 6, 2026), available at https://www.crowdstrike.com/en-us/blog/crowdstrike-founding-member-anthropic-mythos-frontier-model-to-secure-ai/ (noting that the next phase of the EU AI Act takes effect August 2, 2026, bringing automated audit trail requirements, cybersecurity requirements for high-risk AI systems, incident reporting obligations, and penalties up to 3% of global revenue).

[21] Project Glasswing, supra note 2 (“We plan to launch new safeguards with an upcoming Claude Opus model, allowing us to improve and refine them with a model that does not pose the same level of risk as Mythos Preview.”)



Source link

Leave a Reply

Your email address will not be published. Required fields are marked *