Hero backgroundHero background mobile

Navigating Through the LLM Zoo: How to Find the Best Model?

Optimising LLM routing and offloading for cost, quality, and SLA guarantees.

Open-weight large language model (LLM) zoos provide access to numerous high-quality models, but selecting the appropriate model for specific tasks remains challenging and requires technical expertise. Most users simply want factually correct, safe, and satisfying responses without concerning themselves with model technicalities, while inference service providers prioritize minimizing operating costs.

Abstract

In this talk, I will share some of our recent progress towards choosing the best model in the presence of such competing interests. I will first introduce MESS+, a stochastic optimization algorithm for cost-optimal LLM request routing while providing rigorous service level agreement (SLA) compliance guarantees....

Then, considering more sophisticated systems with lightweight local LLMs for processing simple tasks at high speed and large-scale cloud LLMs for handling multi-modal data sources, I will present TMO, a local-cloud LLM inference system with "Three-M" Offloading: Multi-modal, Multi-task, and Multi-dialogue. TMO leverages a strategy based on reinforcement learning (RL) to optimize the inference location and multi-modal data sources to use for each task/dialogue, aiming to maximize the long-term reward while adhering to resource constraints.

Create an account to read the full overview

Please sign up to see more.

Sign up

Already have an account? Sign in

Recommended Talks

Explore related talks that complement this research

Navigating Through the LLM Zoo: How to Find the Best Model?

TECHNICAL

Designing Models That Scale

LLM routing

Model selection optimisation

Navigating Through the LLM Zoo: How to Find the Best Model?

Optimising LLM routing and offloading for cost, quality, and SLA guarantees.

The Building Blocks of Agentic Systems

TECHNICAL

Agents That Actually Work

Agentic AI

AI safety and robustness

The Building Blocks of Agentic Systems

Understand the core components of Agentic AI and design choices to build reliable usable systems

Securing Agentic AI: Robustness, Alignment, Fairness

TECHNICAL

Knowing When AI Fails

AI safety

LLM alignment

Securing Agentic AI: Robustness, Alignment, Fairness

Exploring AI safety improving robustness alignment fairness and securing agentic AI.

Building Agentic Systems that Improve in Production

TECHNICAL

Agents That Actually Work

agentic AI

autonomous AI systems

Building Agentic Systems that Improve in Production

Building agentic systems that autonomously learn, adapt, and improve in production