Curator

PaperBench: Evaluating AI’s Ability to Replicate AI Research

来自 OpenAI News · 2025-04-02 精选

模型评测 AI Agent Agent框架

We introduce PaperBench, a benchmark evaluating the ability of AI agents to replicate state-of-the-art AI research.