Comments on NIST AI RMF: Generative Artificial Intelligence Profile (NIST AI 600-1)

June 1, 2024
View source
Other contributors:

The Machine Intelligence Research Institute (MIRI) has submitted a response to the National Institute of Standards and Technology (NIST) regarding the AI Risk Management Framework (RMF): Generative Artificial Intelligence Profile. We commend NIST's efforts in developing this profile as a crucial step towards managing the risks associated with rapidly advancing AI systems. Our response emphasizes the need to include a dedicated category for misalignment risks in the Profile's Risk List. We define misalignment risks as those arising when the objectives, actions, or behaviors of an AI system do not align with human values, intentions, or expectations. We stress that as AI systems approach and surpass human capabilities, these risks could threaten societal stability, economic security, and even human survival.

We highlight several factors that exacerbate misalignment risks, including our limited understanding of AI systems' internal workings, the unpredictability of emergent capabilities, and economic incentives for developing AI systems capable of long-term planning and increased autonomy.

Our primary recommendation is to add "Misalignment" as a separate category in the list of risks We provide an overview of actions to mitigate misalignment risks, including rigorous evaluation of AI systems, ensuring human understanding of AI actions, implementing independent audits and external oversight, and requiring positive safety cases for powerful AI systems.We offer specific recommendations for actions to include in the profile, including: establishing protocols to determine if humans understand AI actions, evaluating AI systems for agentic capabilities, and implementing measures to prevent AI systems from being stolen.

Footnotes