Public Letter

Transparency about Third-Party AI Evaluations

Independent evaluations are becoming increasingly central to AI governance, but relying on a third party to carry out an evaluation does not on its own ensure quality or impartiality. When third parties publish evaluation results intending to provide independent accountability about a particular system, they should be able to demonstrate that they were able to exercise independent judgment, had sufficient access to the systems they assessed, and that they acted transparently. Doing so will both increase trust in their work and to advance the independent evaluation ecosystem overall. This includes disclosing at least:

  1. The methods used, and whether the evaluator controlled them. To build trust and facilitate independent review, evaluators should use state-of-the-art methods, and they should be transparent about those methods and how they were chosen. Evaluations that seek to validate an AI developer’s own chosen methods have value, but evaluators can provide more robust results when they have freedom to define the evaluation’s scope and methods.
  2. Whether the evaluator had editorial control. The results of an evaluation cannot be treated as genuinely independent if AI providers can edit or suppress findings because they are negative. Providers’ feedback can be valuable, and results should be subject to vigorous disagreement where necessary, but an independent evaluation’s findings should represent the authentic views of the evaluator, without fear of retaliation.
  3. The amount of access and time the evaluator had with the system. Evaluators cannot conduct a trustworthy evaluation without sufficient access to the systems and information in question, for a reasonable amount of time. While there is a legitimate need for AI providers and evaluators to safeguard intellectual property and user privacy, some system characteristics cannot be reliably evaluated without access to system versions and internals not available to the general public.
  4. Whether the evaluator had conflicts of interest. Third-party evaluations are not independent, nor will they be externally trusted, if they are unduly influenced by financial or organizational ties to AI providers. Evaluators should be transparent about such relationships, including whether they receive compensation or other resources from providers, and should take concrete steps to limit conflicts of interest, such as recusing conflicted staff from evaluations.

As AI systems become increasingly capable and widely deployed, there is a growing need for trustworthy, independent evaluations of their capabilities and risks. Third-party evaluators should therefore be transparent about the conditions they are working under, such as by implementing frameworks like the AI Evaluator Forum's minimum conditions standard and related efforts. Governments, consumers, insurance providers, and the public at large should also demand transparency about independent evaluations so that they can be interpreted accurately and fulfill their increasingly central role in AI governance.

Notable Signatories

Markus Anderljung

Director of Policy and Research, GovAI

Jacob Andreas

Associate Professor, MIT

Dean W. Ball

Senior Fellow, Foundation for American Innovation

Elizabeth Barnes

CEO, METR

Yoshua Bengio

Professor at Université de Montréal, Co-President and Scientific Director at LawZero, and Founder and Scientific Advisor, Mila - Quebec AI Institute

Stella Biderman

Executive Director, EleutherAI

Rishi Bommasani

Senior Research Scholar, Stanford University

Miles Brundage

Executive Director, AI Verification and Evaluation Research Institute

Ben Buchanan

Dmitri Alperovitch Assistant Professor at JHU and former White House Special Advisor for AI

Paulo Carvao

Senior Fellow, Mossavar-Rahmani Center for Business and Government at the Harvard Kennedy School

Michael Chen

Member of Policy Staff, METR

Yejin Choi

Professor, Stanford University

Rumman Chowdhury

CEO, Humane Intelligence Public Benefit Corporation

David Danks

Professor, UC San Diego

Rajiv Dattani

Co-founder, The AI Insurance Underwriting Company

Seth Donoughe

Director of AI, SecureBio

Rebecca Finlay

CEO, Partnership on AI

Jonas Freund

Senior Research Fellow, GovAI

Gillian Hadfield

Bloomberg Distinguished Professor of AI Alignment and Governance at Johns Hopkins University, Vector Institute

Daniel E. Ho

Professor, Stanford University

Aidan Homewood

Research Scholar, GovAI

Andrew Ilyas

Assistant Professor, CMU

Sayash Kapoor

PhD candidate at Princeton University and Senior Fellow, Mozilla

Saif M. Khan

Former Director for Technology and National Security at White House National Security Council

Sanmi Koyejo

Assistant Professor, Stanford University

Rayan Krishnan

CEO, Vals AI

Percy Liang

Associate Professor, Stanford University

Shayne Longpre

PhD Candidate, MIT

Sean McGregor

Lead of Engineering Research, AI Verification and Evaluation Research Institute

Arvind Narayanan

Professor, Princeton University

Christopher Painter

Policy Director, METR

Nathaniel Persily

Professor of Law, Stanford Law School

Emma Pierson

Assistant Professor, UC Berkeley

Rob Reich

Professor, Stanford University

Luca Righetti

Senior Research Fellow, GovAI

Sarah Schwettmann

Chief Scientist at Transluce and Research Scientist, MIT

Jaime Sevilla

Director, Epoch AI

Divya Siddarth

Executive Director, Collective Intelligence Project

Scott Singer

Fellow, Carnegie Endowment for International Peace

Ranjit Singh

Director, Data & Society Research Institute

Dawn Song

Professor, UC Berkeley

Joal Stein

Director of Operations and Communications, Collective Intelligence Project

Jacob Steinhardt

CEO at Transluce and Assistant Professor, UC Berkeley

Conrad Stosz

Head of Governance at Transluce and former Acting Director of the U.S. Center for AI Standards and Innovation

Charles Teague

CEO, Meridian Labs

Bri Treece

Co-founder, Fathom

Sign the Letter

By signing, you agree to have your name and title displayed publicly as a supporter. We will not share your personal information with third parties without your consent.

Related Initiative

Learn about our standard for minimum operating conditions for independent evaluations.