OpenAI’s EVMbench AI Security Tool Tests AI Agents on Smart Contracts

OpenAI and Paradigm launch EVMbench to evaluate AI agents’ ability to identify and patch vulnerabilities in smart contracts after the Moonwell incident.

Rich by Coin

2026-02-19

OpenAI and Paradigm on Wednesday released a tool that evaluates AI agents’ ability to identify, patch, or exploit smart contract vulnerabilities. The tool, EVMbench, draws from 120 vulnerabilities identified across more than 40 prior smart contract audits, plus vulnerability scenarios from Paradigm’s forthcoming Tempo blockchain. The release follows a Moonwell incident in which AI-generated code reportedly cost users nearly $2.7 million in crypto, with a Moonwell software engineer saying the code had passed an audit by crypto security firm Halborn.

EVMbench results show GPT-5.3-Codex significantly outperformed earlier models in exploiting vulnerabilities, while its ability to detect and patch flaws remains incomplete. Anthropic’s Claude Opus 4.6 scored highest in detecting vulnerabilities, and GPT-5.3-Codex led in patching and exploiting smart contracts.