Securing Agentic AI: Robustness, Alignment, Fairness

Exploring AI safety improving robustness alignment fairness and securing agentic AI.

We present our research on AI safety, with a focus on advances that enhance the robustness, alignment, and fairness of large language models (LLMs) and agentic systems. We examine key alignment challenges and safety considerations, analyzing both the capabilities and inherent limitations of LLMs particularly in relation to instruction prefix tuning and its theoretical constraints for achieving reliable alignment.

Abstract

We then turn to security risks in agentic systems, including prompt injection and system hijacking, spanning attacks on web-based agents (such as malicious image-based 0click attacks on social media) to vulnerabilities in multi-agent settings. Finally, we discuss methodologies for evaluating safety progress and robustness in these systems.

What you will learn

Create an account to read the full overview

Please sign up to see more.

Already have an account? Sign in

Recommended Talks

Explore related talks that complement this research

Navigating Through the LLM Zoo: How to Find the Best Model?

TECHNICAL

Designing Models That Scale

LLM routing

Model selection optimisation

Navigating Through the LLM Zoo: How to Find the Best Model?

Optimising LLM routing and offloading for cost, quality, and SLA guarantees.

TECHNICAL

Agents That Actually Work

Agentic AI

AI safety and robustness

The Building Blocks of Agentic Systems

Understand the core components of Agentic AI and design choices to build reliable usable systems

Securing Agentic AI: Robustness, Alignment, Fairness

TECHNICAL

Knowing When AI Fails

AI safety

LLM alignment

Securing Agentic AI: Robustness, Alignment, Fairness

Exploring AI safety improving robustness alignment fairness and securing agentic AI.

Building Agentic Systems that Improve in Production

TECHNICAL

Agents That Actually Work

agentic AI

autonomous AI systems

Building Agentic Systems that Improve in Production

Building agentic systems that autonomously learn, adapt, and improve in production