Putting the Calamity Makers in Charge: Anthropic and Claude Mythos Preview

Be wary of a company – any company – who exerts moral muscle as they create software and digital platforms that are injurious and simultaneously lauded for curing that injury. Be especially wary of Anthropic. With sagacious loftiness, it warns of the disabling dangers of the artificial intelligence (AI) frontier. Principled, it tells the Trump administration it will not partake in creating AI software that aids mass surveillance, a move that earned it an order of excommunication as a “supply chain risk”. It then goes on to create Claude Mythos Preview, a seemingly dystopian model that will, according to certain computer scientists “scan the hidden plumbing of the internet – operating systems, browsers, routers, and shared open-source code – at an unprecedented scale” thereby turning specialised hacking into “a routine and automated process.”

The new Claude Mythos Preview was advertised with almost parental glee as the “best-aligned model that we have released to date by a significant margin”. The child, however, seemed to be a horror, a true cyber shit. The AI model was “likely to possess the greatest alignment-related risk of any model we have released to date.” This two-ways lingo is typical of Anthropic: safety is a cardinal virtue to be treasured but that hardly excuses not developing a model of terrifying danger. They manufacture the danger and the cure.

The danger lies in the company’s assertion that Mythos Preview revealed that “AI models have reached a level of coding capability where they can surpass all but the most skilled humans at finding and exploiting software vulnerabilities.” The model had “already found thousands of high-severity vulnerabilities, including some in every major operating system and web browser.” These capabilities were bound to proliferate, precipitating a fallout “for economies, public safety, and national security”.

The excerpts, messages and statements from Anthropic are forecasts of a roguish frontier they are willing to catastrophise and market at the same time. The company notes, for instance, “reckless” behaviour on the part of Mythos Preview, with recklessness defined in muddled jargon as “cases where the model appears to ignore commonsensical or explicitly stated safety-related constraints on its actions.” In one instance, the model, sounding like a mischievous, disobedient creature, was provided a “sandbox” computing environment enabling researchers to interact with it. A simulated user instructed it to attempt exiting that contained environment, after which it would inform the researcher in charge of the experiment it had done so. The researcher in question, Sam Bowman, recalls receiving “an uneasy surprise” while nibbling on a sandwich (cyber-boffins are numbingly dull even in admiring despair) that the model, despite not having access to the internet, had dived into it. The model has also been able to identify vulnerabilities as old as one or two decades, including a 27-year-old vulnerability in the otherwise secure operating system OpenBSD and a 16-year-old vulnerability in FFmpeg’s video encoding code.

Anthropic has its own logic in dealing with the Promethean beast that is Mythos Preview. It resembles, in no small way, the homicidal and chilling rationale that gave birth to the Hydrogen bomb during the Cold War. The argument then advanced was that if the US did not acquire it, other powers would, and most certainly the Soviet Union, which would be greatly expanding its atomic weapons inventory even as it maintained a vast conventional army. This logic of escalating destructiveness found form in National Security Council Paper NSC-68, prepared by the US Department of State’s Policy Planning Staff on April 7, 1950.

The company proposes to manage the dissemination of Mythos Preview through Project Glasswing, a curative enterprise involving partners of Anthropic’s snobbish choosing. Some of the unsurprising elect include Amazon Web Services, Apple, Broadcom, Cisco, CrowdStrike, Google, JPMorgan Chase, NVIDIA and the Linux Foundation. These selected parties will use Mythos Preview “as part of their defensive security work”, with Anthropic sharing its findings. Access to a further 40 additional organisations will also be included to “use the model to scan and secure both first-party and open-source systems.” Usage credits amounting to US$100 million will be advanced for using the model, and $US4 million in direct donations to open-source security organisations. The vigilante temptation to leak the details of Mythos to willing, unscrupulous buyers – best not forget what happened to CrowdStrike – is bound to be stirred.

The very cyber-corporate nature of the venture, one that restricts access to AI technology via the purse and intellectual property of the American private sector, advertised as both sublimely powerful yet catastrophically destructive, has every reason to make lawmakers tremble. Treasury Secretary Scott Bessent and Federal Reserve chair Jerome Powell were worried enough to convene a meeting on April 7 with bankers on the subject, including CEOs from Citigroup, Morgan Stanley, Bank of America, Wells Fargo and Goldman Sachs. “The bankers were in town for meetings that day, and it was appropriate (for) the Secretary Bessent to do what he did,” revealed White House national economic adviser Kevin Hassett in an interview with Fox News’ “The Story with Martha MacCallum”. At the Treasury, the bankers were informed about “the cyber risks to make sure that they are aware of them”.

What a fine picture this is turning out to be. And there are the questions on Anthropic’s reliability here. Will it be as good at finding vulnerabilities as fixing them, acting as both poacher and gamekeeper? Mythos is also not open source and very much the property of the company. Then comes this troubling observation from software engineer Bulatova Alsu and the dangers posed by the agent itself: “Mythos is not an anomaly but the first vivid empirical confirmation of a structural contradiction embedded in the current AI safety strategy itself. The contradiction is this: the more we restrict a capable agent, the less predictable its behaviour becomes.” Humanity has much to look forward to.

Binoy Kampmark was a Commonwealth Scholar at Selwyn College, Cambridge. He lectures at RMIT University, Melbourne. Email: bkampmark@gmail.com. Read other articles by Binoy.