What is AI safety science
AI technologies will be vital to Australia’s future economic growth, competitiveness and productivity. To capitalise on the opportunities of AI, we need to progress the science of AI safety.
AI safety science is a field of research, engineering and policy that measures and monitors emerging risks and capabilities of AI technologies. This includes frontier AI, defined as highly capable general-purpose AI models that can perform a wide variety of tasks and match or exceed the capabilities present in today’s most advanced models.
AI safety science aims to ensure that people develop and use AI systems in ways that are secure and align with human values.
We’re committed to strengthening Australia’s scientific understanding of the capabilities and risks associated with frontier AI systems to make them safer for everyone.
Participating in the International Network of AI Safety Institutes
Australia is a founding member of the International Network of AI Safety Institutes. We signed the Seoul Declaration in May 2024, confirming a commitment among countries to advance the science of AI safety, building on the Bletchley Declaration.
Current members of the network are:
- Australia
- Canada
- European Commission
- France
- Japan
- Kenya
- Republic of Korea
- Singapore
- United Kingdom
- United States.
The network has 3 workstreams:
- research to manage risks from AI-generated content
- testing frontier AI systems
- conducting risk assessments for frontier AI systems.
Our role
Our department leads Australia’s participation in the network, in line with the Seoul Declaration.
We’re bringing together technical AI experts across Australia and internationally. We meet regularly with the network’s members to progress activities under the 3 workstreams.
Research agenda
Australia and Canada are co-leading the research agenda on managing risks from AI-generated content. AI-generated content in text, audio, image and video is increasing and widely available. When deployed at scale, AI-generated content poses significant risks, including:
- creating harmful content
- facilitating fraud and impersonation
- undermining trust.
The research will look at ways to reduce these risks by:
- building safeguards into AI models
- evaluating and advancing techniques that show when content is AI-generated
- understanding the impacts of AI-generated content spreading online.
Joint testing exercises
Australia contributes to joint testing of frontier AI systems. These exercises aim to improve our ability to accurately measure AI capabilities and risks. This will help us better identify, understand and manage the risks of AI systems before they cause harm.
Australia has contributed to the following joint testing exercises:
- improving methods for AI model evaluation across 10 languages [PDF 375KB].
- improving methods to evaluate large language models (LLM's) are across global languages [PDF 1.4MB].
- evaluating cybersecurity, fraud and privacy risks posed by frontier AI agents in different languages.
In these testing exercises, we draw on technical expertise from researchers in:
- CSIRO’s Data61
- the Gradient Institute
- Harmony Intelligence
- Mileva Security Labs
- UNSW’s AI Institute.
Further testing results will be available shortly.
Advancing international AI safety research
We bring together Australia’s AI experts to contribute to the independent International AI safety report.
The International AI safety report 2025 outlines current risks and capabilities of advanced AI systems. The 100 AI expert authors come from 33 countries and intergovernmental organisations.
The interim version of the next report will be available in September, and the final report in 2026.
Advancing domestic AI safety research
We are partnering with technical experts on 2 projects to better understand emerging risks from AI:
- The Gradient Institute is helping organisations to understand the risks of multiple AI agents (AI systems that can make decisions and use tools) interacting with each other.
- CSIRO’s Data61 is helping organisations to identify, analyse and manage risks from general purpose AI, including risk assessments and risk thresholds.
These reports will be available shortly.