Introduction
The International Network for Advanced AI Measurement, Evaluation and Science conducted this joint testing exercise of how well AI models follow safety rules in different languages. The exercise intended to build a shared, better way to test AI models across many languages.
Australia contributed to the exercise. Singapore, Japan and the United Kingdom led this exercise alongside AI Safety Institutes and government mandated offices from Canada, the European Union, France, Kenya and South Korea.
We drew on technical expertise from researchers in:
- CSIRO’s Data61
- Gradient Institute
- Harmony Intelligence
- Mileva Security Labs
- UNSW’s AI Institute.
People in many countries and languages use AI models, and safety checks need to work for everyone, not just English speakers. This exercise looked at whether AI models that seem safe in English are also safe in other languages. It also tried to improve the way testing is done so results are more trustworthy and consistent.
These exercises aim to improve our ability to accurately measure AI capabilities and risks. This will help us better identify, understand and manage the risks of AI systems before they cause harm.