Exploring AI safety improving robustness alignment fairness and securing agentic AI.
We present our research on AI safety, with a focus on advances that enhance the robustness, alignment, and fairness of large language models (LLMs) and agentic systems. We examine key alignment challenges and safety considerations, analyzing both the capabilities and inherent limitations of LLMs particularly in relation to instruction prefix tuning and its theoretical constraints for achieving reliable alignment.
Abstract
We then turn to security risks in agentic systems, including prompt injection and system hijacking, spanning attacks on web-based agents (such as malicious image-based 0click attacks on social media) to vulnerabilities in multi-agent settings. Finally, we discuss methodologies for evaluating safety progress and robustness in these systems.
What you will learn
Explore related talks that complement this research

TECHNICAL
Designing Models That Scale
LLM routing
Model selection optimisation
Navigating Through the LLM Zoo: How to Find the Best Model?
Optimising LLM routing and offloading for cost, quality, and SLA guarantees.

TECHNICAL
Agents That Actually Work
Agentic AI
AI safety and robustness
The Building Blocks of Agentic Systems
Understand the core components of Agentic AI and design choices to build reliable usable systems
TECHNICAL
Knowing When AI Fails
AI safety
LLM alignment
Securing Agentic AI: Robustness, Alignment, Fairness
Exploring AI safety improving robustness alignment fairness and securing agentic AI.

TECHNICAL
Agents That Actually Work
agentic AI
autonomous AI systems
Building Agentic Systems that Improve in Production
Building agentic systems that autonomously learn, adapt, and improve in production