A Safe Harbor for Independent AI Evaluation
We propose that AI companies make simple policy changes to protect good faith research on their models, and promote safety, security, and trustworthiness of AI systems. We, the undersigned, represent members of the AI, legal, and policy communities with diverse expertise and interests. We agree on three things:
Independent evaluation is necessary for public awareness, transparency, and accountability of high impact generative AI systems.
Hundreds of millions of people have used generative AI in the last two years. It promises immense benefits, but also serious risks related to bias, alleged copyright infringement, and non-consensual intimate imagery. AI companies, academic researchers, and civil society agree that generative AI systems pose notable risks and that independent evaluation of these risks is an essential form of accountability.
Currently, AI companies’ policies can chill independent evaluation.
While companies’ terms of service deter malicious use, they also offer no exemption for independent good faith research, leaving researchers at risk of account suspension or even legal reprisal. Whereas security research on traditional software has established voluntary protections from companies (“safe harbors”), clear norms from vulnerability disclosure policies, and legal protections from the DOJ, trustworthiness and safety research on AI systems has few such protections. Independent evaluators fear account suspension (without an opportunity for appeal) and legal risks, both of which can have chilling effects on research. While some AI companies now offer researcher access programs, which we applaud, the structure of these programs allows companies to select their own evaluators. This is complementary, rather than a substitute, for the full range of diverse evaluations that might otherwise take place independently.
AI companies should provide basic protections and more equitable access for good faith AI safety and trustworthiness research.
Generative AI companies should avoid repeating the mistakes of social media platforms, many of which have effectively banned types of research aimed at holding them accountable, with the threat of legal action, cease-and-desist letters, or other methods to impose chilling effects on research. In some cases, generative AI companies have already suspended researcher accounts and even changed their terms of service to deter some types of evaluation (discussed here). Disempowering independent researchers is not in AI companies’ own interests. To help protect users, we encourage AI companies to provide two levels of protection to research.
First, a legal safe harbor would indemnify good faith independent AI safety, security, and trustworthiness research, provided it is conducted in accordance with well-established vulnerability disclosure rules.
Second, companies should commit to more equitable access, by using independent reviewers to moderate researchers’ evaluation applications, which would protect rule-abiding safety research from counterproductive account suspensions, and mitigate the concern of companies selecting their own evaluators.
While these basic commitments will not solve every issue surrounding responsible AI today, it is an important first step on the long road towards building and evaluating AI in the public interest.
Additional reading on these ideas: a safe harbor for AI evaluation (by letter authors), algorithmic bug bounties, and credible third-party audits. (Signatures are for this letter, not the further reading.)
paper: A Safe Harbor for AI Evaluation and Red Teaming
The paper titled "A Safe Harbor for AI Evaluation and Red Teaming" is authored by Shayne Longpre and colleagues, and was published on March 5, 2024. The paper discusses the critical importance of providing a safe harbor for independent evaluation and red teaming of artificial intelligence (AI) systems. Red teaming, in the context of security, refers to a group authorized to emulate an adversary's attack against an organization's security systems. In AI, this term has been adopted to describe penetration testing aimed at uncovering a broader set of system flaws than traditional security.
The paper highlights that while independent evaluation and red teaming are essential for identifying risks posed by generative AI systems, the terms of service and enforcement strategies used by prominent AI companies can disincentivize good faith safety evaluations. This has led to concerns among researchers that conducting such research or releasing their findings might result in account suspensions or legal repercussions. Although some companies offer researcher access programs, these are seen as inadequate substitutes for independent research access due to limited community representation, inadequate funding, and lack of independence from corporate incentives.
The authors propose that major AI developers commit to providing a legal and technical safe harbor to indemnify public interest safety research and protect it from the threat of account suspensions or legal reprisal. These proposals are based on the collective experience of the authors in conducting safety, privacy, and trustworthiness research on generative AI systems, where norms and incentives could be better aligned with public interests without exacerbating model misuse.
The paper also discusses how to implement these protections to ensure inclusive and unimpeded community efforts in tackling the risks of generative AI. The authors suggest two main voluntary commitments: (i) a legal safe harbor to offer legal protections for good faith research conducted in line with vulnerability disclosure policies, and (ii) a technical safe harbor to protect safety researchers from having their accounts subject to moderation or suspension. These safe harbors should encompass research activities that uncover any system flaws, including all undesirable generations currently prohibited by the usage policy.
In conclusion, the paper emphasizes the importance of independent AI evaluation and proposes a series of recommendations to improve researchers' access, reduce fear of reprisals for safety research, and promote broader community participation. The authors hope that generative AI companies will adopt these commitments to establish better community norms, enhance trust in their services, and bolster much-needed AI safety in proprietary systems.