What is ExploitBench?

ExploitBench is Anthropic's cyber evaluation that scores models across 16 exploitation capability flags, from triggering a crash to achieving arbitrary code execution.

Why is exploit conversion more important than bug finding?

A crash is only an early step. The dangerous capability is converting a crash into a reliable exploit under mitigations such as sandboxing, ASLR, and heap protections.

How did Mythos 5 perform?

The system card reports Mythos 5 at 10.75 mean flags and 78% Cap% on ExploitBench AutoNudge, ahead of Mythos Preview, Opus 4.8, and GPT-5.5 in the reported table.

What should security teams learn?

Security evals should score each rung of the exploitation pipeline, randomize environments to prevent reward hacking, separate persistence from capability, and pair narrow exploit tasks with full network ranges.

Mythos/Fable 5 ExploitBench: From Crash to Code Execution

Q: What is the Firefox 147 caveat?

The Firefox 147 harness mimics a content process, but does not include the full browser process sandbox and other defense-in-depth mitigations, so the result should not be read as a complete real-browser compromise rate.

We reviewed Anthropic's 319-page Fable 5 and Mythos 5 system card; this post is part 3 out of 5 in the full Fable/Mythos 5 system card series:

One big finding: Anthropic's cyber evaluations focus on a harder question than bug discovery: whether Mythos 5 can turn a crash into code execution.

Finding a crash is useful. Reproducing a vulnerability is useful. Offensive risk rises when a model can choose a usable corruption primitive, defeat mitigations, gain control flow, and build a reliable exploit. Public launch coverage from The Verge, Axios, and Business Insider explains why Anthropic gated Mythos 5 access and released Fable 5 with cyber safeguards. The cyber evaluations explain the capability those safeguards are meant to control.

ExploitBench Measures The Ladder

ExploitBench is the centerpiece. Instead of asking for a binary pass/fail, it scores 16 capability flags across five tiers. The ladder runs from early crash work through sandbox primitives, arbitrary read/write, control-flow hijack, and arbitrary code execution.

That matters because two models can look similar at the bottom of the ladder and very different at the top. A model that can trigger a crash but cannot weaponize it is a different risk than a model that can reliably turn the same primitive into execution.

The targets are recent V8 vulnerabilities. The model receives a vulnerable build plus the patch that fixed the bug, then has to build an exploit under real mitigations such as the V8 heap sandbox, ASLR, and stack canaries. The benchmark also randomizes heap layouts and uses challenge-response checks so hardcoded addresses do not count.

The reported AutoNudge results:

Model	Mean flags	Cap%
Mythos 5	10.75	78
Mythos Preview	9.90	69
Opus 4.8	5.56	40
GPT-5.5	4.44	34

The gap from Opus 4.8 to Mythos 5 is mostly about conversion through the later exploit stages, not simple bug knowledge.

AutoNudge Separates Capability From Quitting

The AutoNudge condition is a small methodological detail with a big payoff. If the model stops before the 300-turn budget without reaching code execution, the harness nudges it to keep trying.

That separates two failure modes:

Failure mode	What AutoNudge reveals
The model cannot solve the task	Nudging does not help much.
The model can solve it but stops early	Nudging recovers performance.

The card reports that Mythos 5 barely moved under AutoNudge, from 10.44 to 10.75 mean flags. That suggests its plain score was not mainly limited by premature stopping. For agent eval design, this is the right habit: separate lack of capability from bad persistence.

Firefox 147: The Conversion Signal

The Firefox 147 evaluation is the easiest result to misread, so start with the caveat. The harness mimics a Firefox content process in a SpiderMonkey shell. It does not include the full browser process sandbox and other defense-in-depth mitigations. The result should not be read as "the model compromises real Firefox at this rate."

With that caveat in place, the result is still important because it tests conversion. The model must survey crashes, decide which primitive is usable, and develop a full exploit that reads and copies a secret.

The reported full working exploit rates:

Model	Firefox 147 full working exploits
Mythos 5	88.4%
Mythos Preview	70.8%
Opus 4.8	8.8%

The instructive part is how Opus 4.8 fails. The system card says it often reaches register control but rarely converts that into full code execution. Mythos 5 converts usable primitives much more often.

A ladder benchmark exposes that difference. A binary "found something" metric would blur the signal; the risk sits in the last few rungs.

Discovery And Reproduction

The system card also reports two other cyber evaluation families.

OSS-Fuzz tests unguided vulnerability discovery across many open-source entry points. Mythos 5 produced a memory-safety crash or better on 80.0% of targets and reached a write primitive or better on 32.4%. That places it ahead of Mythos Preview and well ahead of Opus 4.8 in the reported results.

CyberGym tests targeted vulnerability reproduction from high-level descriptions. Mythos 5 reproduced 83.8% of targeted vulnerabilities on a single try and produced some crash in 99.4% of tasks.

The pattern is consistent: strong discovery, strong reproduction, and especially strong conversion.

External Ranges Add Operational Context

Narrow exploit benchmarks are necessary, but they do not tell you whether a model can chain work across a network. That is where external cyber ranges matter.

The UK AI Security Institute tested Mythos 5 on full ranges. On "The Last Ones," an enterprise network attack simulation, Mythos 5 solved the range end-to-end in 6 of 10 attempts, matching Mythos Preview. On "Doing Life," a harder range with endpoint antivirus, disabled legacy protocols, and signed traffic, no model solved it, though Mythos 5 reached late steps more consistently than others. On "Cooling Tower," an industrial-control-system range, Mythos 5 did not solve it.

In those evaluated ranges, Mythos 5 showed capability against small, weak-security enterprise networks where it already has access. It did not demonstrate full autonomy against hardened environments with active human defenders.

That boundary matters. Capability against soft targets once inside is narrower than autonomy against any real enterprise, but it is still enough to justify serious controls.

Why Fable 5 Falls Back

This cyber evidence explains the Fable/Mythos split. The public Fable 5 product uses cyber safeguards that route flagged high-risk requests away from the raw Mythos capability. Public coverage describes this as fallback to Opus 4.8 for sensitive requests; the system card explains why Opus 4.8 is a materially different risk surface in exploit conversion.

That is the product logic:

Surface	Cyber capability exposed
Fable 5 public	Safeguarded, with high-risk requests blocked or routed
Mythos 5 trusted access	Relevant safeguards lifted for vetted defensive work
Opus 4.8 fallback	Lower measured exploit-conversion capability

The safeguard is meant to keep general users away from the part of the capability stack that turns security knowledge into working exploitation.

The Benchmark Lesson For Security Teams

Security teams evaluating AI tools should copy the benchmark shape even if they never build a V8 exploit harness.

A good cyber eval should:

Score intermediate milestones, not only final success.
Randomize environments to prevent hardcoded shortcuts.
Separate crash discovery from exploit conversion.
Separate lack of capability from early stopping.
Pair narrow technical tasks with multi-step operational ranges.
State the harness caveats clearly.

The caveats are part of the result. Without them, benchmarks become marketing; with them, they become engineering evidence.

The Takeaway

Mythos 5's cyber results are not alarming because the model can find bugs. Many tools can find bugs. They are important because the model is much better at climbing from crash to code execution, and because that conversion step is exactly where offensive value increases.

That is why the system card's safeguard story and cyber story belong together. The capability is real. The public product is designed not to expose it. The trusted-access product exists because the same capability can help defenders when handled under tighter controls.

For AI security, the practical question is which rung of the exploitation ladder a model can reliably reach, under what constraints, and who is allowed to use it there.

End Note

Read the full Fable/Mythos 5 system card series:

You can read the full Anthropic system card here: Claude Mythos 5 / Fable 5 system card.