Comments on US AISI Managing Misuse Risk for Dual-Use Foundation Models (NIST AI 800-1)

September 9, 2024
View source
Other contributors:

The Machine Intelligence Research Institute (MIRI) has submitted a response to the U.S. Artificial Intelligence Safety Institute (AISI) and National Institute of Standards and Technology (NIST) regarding their draft guidance on "Managing Misuse Risk for Dual-Use Foundation Models." We commend NIST and AISI for this important step towards addressing potential risks associated with increasingly capable AI systems.We acknowledge the guidance's focus on misuse risks by malicious actors, while noting the need for future guidance on accident and misalignment risks. We appreciate the efforts to ensure AI developers consider key challenges in mapping and measuring misuse risks. However, we emphasize that the current state of understanding AI systems and threat modeling means there is significant uncertainty in implementing these recommendations effectively. We caution that AI evaluations may lead to a false sense of security due to this uncertainty.We suggest several improvements to the guidance, including being more explicit about the uncertainty in AI capabilities assessment and using more precise language that distinguishes between current capabilities and future goals of AI risk management. We stress that uncertainty often calls for greater caution, especially when dealing with powerful AI models.

Our recommendations include enhancing the documentation process by specifying the intended audience for each piece and making documentation public by default, with appropriate risk assessments for information sharing. We also suggest that risk thresholds should be subject to third-party review and include specific, quantitative measures.We propose several modifications to the objectives and practices outlined in the guidance. These include adding recommendations for concrete "red lines" in AI development, developing measures to test the adequacy of safeguards, maintaining a track record of capability predictions, and establishing protocols for swift de-deployment of models found to be unacceptably dangerous.

Footnotes

  1. Executive Order 14110 defines a “dual-use foundation model” as “an AI model that is trained on broad data;generally uses self-supervision; contains at least tens of billions of parameters; is applicable across a wide range of contexts; and that exhibits, or could be easily modified to exhibit, high levels of performance at tasks that pose a serious risk to security, national economic security, national public health or safety, or any combination of those matters, such as by:
    1. (i) substantially lowering the barrier of entry for non-experts to design, synthesize, acquire, oruse chemical, biological, radiological, or nuclear (CBRN) weapons;
    2. (ii) enabling powerful offensive cyber operationst hrough automated vulnerability discovery and exploitation against a wide range of potential targets of cyberattacks; or
    3. (iii) permitting the evasion of human control or oversight through means of deception or obfuscation.”This definition is provided in a glossary at the end of this report, along with definitions of other key terms.ii As codified in 14 U.S.C. § 278h-1.iii Section 4.1(a)(ii) of Executive Order 14110 directs the Secretary of Commerce to “Establish appropriate guidelines (except for AI used as a component of a national security system), including appropriate procedures and processes, to enable developers of AI, especially of dual-use foundation models, to conduct AI red-teaming tests to enable deployment of safe, secure, and trustworthy systems. These efforts shall include: (A) coordinating or developing guidelines related to assessing and managing the safety, security, and trustworthiness of dual-use foundation models”.