OpenAI and Paradigm on Wednesday released a tool that evaluates AI agents’ ability to identify, patch, or exploit smart contract vulnerabilities. The tool, EVMbench, draws from 120 vulnerabilities identified across more than 40 prior smart contract audits, plus vulnerability scenarios from Paradigm’s forthcoming Tempo blockchain. The release follows a Moonwell incident in which AI-generated code reportedly cost users nearly $2.7 million in crypto, with a Moonwell software engineer saying the code had passed an audit by crypto security firm Halborn.
EVMbench results show GPT-5.3-Codex significantly outperformed earlier models in exploiting vulnerabilities, while its ability to detect and patch flaws remains incomplete. Anthropic’s Claude Opus 4.6 scored highest in detecting vulnerabilities, and GPT-5.3-Codex led in patching and exploiting smart contracts.














Leave a Reply