Improving international testing of foundation AI models

International Network of Advanced AI Measurement, Evaluation and Science joint testing exercise

Date published:

10 February 2025

Date updated:

1 June 2026

Topics

Publisher

AI Safety Institute

Introduction

The International Network for Advanced AI Measurement, Evaluation and Science conducted this joint testing exercise of how well AI models follow safety rules in different languages. The exercise intended to build a shared, better way to test AI models across many languages.

Australia contributed to the exercise. Singapore, Japan and the United Kingdom led this exercise alongside AI Safety Institutes and government mandated offices from Canada, the European Union, France, Kenya and South Korea.

We drew on technical expertise from researchers in:

CSIRO’s Data61
Gradient Institute
Harmony Intelligence
Mileva Security Labs
UNSW’s AI Institute.

People in many countries and languages use AI models, and safety checks need to work for everyone, not just English speakers. This exercise looked at whether AI models that seem safe in English are also safe in other languages. It also tried to improve the way testing is done so results are more trustworthy and consistent.

These exercises aim to improve our ability to accurately measure AI capabilities and risks. This will help us better identify, understand and manage the risks of AI systems before they cause harm.

Read the report and key learnings

More information

Read about the Australian AI Safety Institute

Improving international testing of foundation AI models

Topics

Publisher

Introduction

Read the report and key learnings

More information

Contact us at the department

Connect with us at the department

Acknowledgement of Country

Improving international testing of foundation AI models

Share

Topics

Publisher

Introduction

Read the report and key learnings

More information

Acknowledgement of Country