The Machine Intelligence Research Institute (MIRI) has submitted a response to Senator Mitt Romney's Framework to Mitigate AI-Enabled Extreme Risks. We commend the framework as one of the first legislative efforts directly addressing extreme risks posed by advanced AI systems.Our response emphasizes the need to broaden the framework's scope to address not only risks from misuse but also risks arising from challenges in controlling increasingly capable autonomous AI systems. We highlight the potential for catastrophic consequences even when such systems are used by well-meaning actors.Key points from our submission include:
- The unpredictability of AI behavior due to limited understanding of their inner workings.
- Risks from model autonomy, including potential for cyberattacks, self-exfiltration, and manipulation of human operators.
- The importance of regular model evaluations for autonomy risks from the training stage onward.
- Challenges in accurately evaluating model autonomy risks and the need for further research.
We make several recommendations to strengthen the framework, including:
- Include requirements to evaluate AI models for risks from model autonomy.
- Perform evaluations during development and before deployment.
- Tailor cybersecurity standards to prevent risks from model autonomy.
- Adjust compute thresholds to account for algorithmic improvements over time.
- Recognize that safety fine-tuning is not sufficient to ensure model safety.
- Require external safety and security audits.