Route Configuration

Routes define which provider handles which type of request. Admin Bud-E supports four service types:

LLM — Large Language Models (text generation)
VLM — Vision-Language Models (text + image understanding)
TTS — Text-to-Speech (audio generation)
ASR — Automatic Speech Recognition (audio transcription)

Each route specifies:

Service type (LLM/VLM/TTS/ASR)
Provider (which provider to use)
Model (specific model identifier)
Priority (for failover: lower numbers = higher priority)

How Routes Work

When a request comes in:

Admin Bud-E identifies the service type (LLM, VLM, TTS, or ASR)
Finds all routes for that type, sorted by priority (ascending)
Tries the highest priority route (lowest number) first
If that fails with a retryable error (429, 5xx), tries the next priority
Continues until success or all routes exhausted

This gives you automatic failover without client-side changes.

Adding a Route

Navigate to Routes in the Admin UI
Click Add Route
Fill in:
- Type: Select service type (LLM/VLM/TTS/ASR)
- Provider: Select from your configured providers
- Model: Model identifier (provider-specific)
- Priority: Number (1 = highest, 10 = lowest)
Click Save

Priority System

Priority determines the order routes are tried.

Rules:

Lower numbers = higher priority = tried first
If multiple routes have the same priority, order is undefined
Gaps are allowed (you can use priorities 1, 5, 10)

Common patterns:

Single Provider (No Failover)

LLM Routes:

Route 1: Vertex Gemini, priority 1

All LLM requests go to Vertex. If it fails, the request fails.

Two Providers (Simple Failover)

LLM Routes:

Route 1: Vertex Gemini, priority 1
Route 2: Together Llama, priority 2

Tries Vertex first. On rate-limit or server error, falls back to Together.

Three Providers (Full Redundancy)

LLM Routes:

Route 1: Vertex Gemini, priority 1
Route 2: Together Llama, priority 2
Route 3: Mistral Large, priority 3

Maximum availability. Request succeeds unless all three fail.

Failover Behavior

Retryable Errors

Admin Bud-E automatically retries the next route when it sees:

429 Too Many Requests (rate limit)
5xx Server Errors (503, 502, 500, etc.)
Network timeouts

Non-Retryable Errors

These errors do not trigger failover:

400 Bad Request (malformed input)
401 Unauthorized (invalid API key)
404 Not Found (model doesn't exist)
4xx Client Errors (except 429)

The request fails immediately with the error message.

TIP

Non-retryable errors usually indicate configuration problems. Check your provider settings and model names.

Model Identifiers

Model names are provider-specific. Check each provider's documentation:

Vertex AI (Google)

LLM/VLM:

gemini-1.5-flash (fast, cost-effective)
gemini-1.5-pro (most capable)
gemini-1.0-pro (legacy)

TTS:

en-US-Neural2-C (female voice)
en-US-Neural2-D (male voice)
See Cloud TTS voices

ASR:

default (standard model)
latest_long (optimized for longer audio)

Together AI

Example models:

meta-llama/Llama-3-70b-chat-hf
mistralai/Mixtral-8x7B-Instruct-v0.1
togethercomputer/CodeLlama-34b-Instruct

See Together's model list.

Mistral AI

Models:

mistral-large-latest (most capable)
mistral-medium-latest (balanced)
mistral-small-latest (fast, efficient)

OpenAI

Models:

gpt-4-turbo
gpt-4
gpt-3.5-turbo

Route Examples

Example 1: Vertex-Only Setup

Goal: Use only Google Vertex AI for everything.

Routes:

LLM: Vertex, gemini-1.5-flash, priority 1
VLM: Vertex, gemini-1.5-pro, priority 1
TTS: Vertex, en-US-Neural2-C, priority 1
ASR: Vertex, default, priority 1

Example 2: Cost-Optimized with Failover

Goal: Use cheapest provider first, fall back to premium on rate limits.

LLM Routes:

Together, meta-llama/Llama-3-70b-chat-hf, priority 1 (cheap, fast)
Vertex, gemini-1.5-flash, priority 2 (fallback)
Mistral, mistral-large-latest, priority 3 (last resort)

VLM Routes:

Vertex, gemini-1.5-flash, priority 1 (multimodal)

Example 3: Geographic Redundancy

Goal: Try EU provider first, fall back to US if needed.

LLM Routes:

Mistral, mistral-large-latest, priority 1 (EU)
Vertex EU, gemini-1.5-pro, priority 2 (EU)
Together, meta-llama/Llama-3-70b-chat-hf, priority 3 (may be US)

Editing Routes

To change an existing route:

Navigate to Routes
Click Edit on the route
Update fields
Click Save

Changes take effect immediately.

Deleting Routes

To remove a route:

Navigate to Routes
Click Delete on the route
Confirm deletion

WARNING

If you delete all routes for a service type (e.g., all LLM routes), requests of that type will fail.

Debugging Routes

Problem: "No route found for service type LLM"

Cause: No LLM routes exist.

Solution: Add at least one LLM route.

Problem: "Provider 'vertex' not found"

Cause: Route references a provider that doesn't exist or has a typo.

Solution:

Check provider name exactly matches (case-sensitive)
For Vertex, ensure name is exactly vertex (lowercase)
Verify provider exists in Providers page

Problem: All requests fail with 429

Cause: Rate limit on primary provider, no failover configured.

Solution: Add a second provider with priority 2.

Problem: Wrong model being used

Cause: Route priority is incorrect.

Solution: Check priorities — lower number = higher priority.

Testing Routes

After configuring routes:

Make a test request from the frontend
Check Usage → detailed logs
Verify:
- Correct provider was used
- Correct model was invoked
- Credits match expected pricing

If failover occurred, you'll see multiple entries (one per attempt).

Performance Considerations

Latency

Each failover attempt adds latency:

Primary failure: ~5-30 seconds (depends on timeout)
Secondary attempt: additional ~5-30 seconds

Best practices:

Use fast providers as primary
Set reasonable timeouts
Monitor failure rates

Cost

Failover attempts may be billed by the provider even if they fail:

429 errors usually don't incur charges
Partial responses (timeouts) might incur charges
Check provider billing for details

Rate Limits

If you're hitting rate limits often:

Request quota increases from provider
Add more providers to spread load
Implement client-side rate limiting
Use slower/cheaper models with higher quotas

Advanced Patterns

Load Balancing

For high traffic, alternate between providers:

LLM Routes:

Vertex, gemini-1.5-flash, priority 1
Together, meta-llama/Llama-3-70b-chat-hf, priority 1

Both have same priority → middleware picks one (implementation-defined, typically round-robin or random).

WARNING

Load balancing behavior with equal priorities is not guaranteed. For explicit control, use different priorities.

Cost vs. Quality Tiers

LLM Routes:

Together, meta-llama/Llama-3-8b-chat-hf, priority 1 (cheap, fast, decent)
Vertex, gemini-1.5-flash, priority 2 (more capable, more expensive)
Vertex, gemini-1.5-pro, priority 3 (best quality, most expensive)

Most requests use the cheap model. On rate-limits or errors, step up to better models.

Model-Specific Routing

Different models for different use cases:

Text-only (LLM):

Vertex, gemini-1.5-flash, priority 1

Multimodal/Images (VLM):

Vertex, gemini-1.5-pro, priority 1 (better vision understanding)

Long context:

Together, togethercomputer/CodeLlama-34b-Instruct, priority 1

Route Configuration ​

How Routes Work ​

Adding a Route ​

Priority System ​

Single Provider (No Failover) ​

Two Providers (Simple Failover) ​

Three Providers (Full Redundancy) ​

Failover Behavior ​

Retryable Errors ​

Non-Retryable Errors ​

Model Identifiers ​

Vertex AI (Google) ​

Together AI ​

Mistral AI ​

OpenAI ​

Route Examples ​

Example 1: Vertex-Only Setup ​

Example 2: Cost-Optimized with Failover ​

Example 3: Geographic Redundancy ​

Editing Routes ​

Deleting Routes ​

Debugging Routes ​

Problem: "No route found for service type LLM" ​

Problem: "Provider 'vertex' not found" ​

Problem: All requests fail with 429 ​

Problem: Wrong model being used ​

Testing Routes ​

Performance Considerations ​

Latency ​

Cost ​

Rate Limits ​

Advanced Patterns ​

Load Balancing ​

Cost vs. Quality Tiers ​

Model-Specific Routing ​

Next Steps ​

Route Configuration

How Routes Work

Adding a Route

Priority System

Single Provider (No Failover)

Two Providers (Simple Failover)

Three Providers (Full Redundancy)

Failover Behavior

Retryable Errors

Non-Retryable Errors

Model Identifiers

Vertex AI (Google)

Together AI

Mistral AI

OpenAI

Route Examples

Example 1: Vertex-Only Setup

Example 2: Cost-Optimized with Failover

Example 3: Geographic Redundancy

Editing Routes

Deleting Routes

Debugging Routes

Problem: "No route found for service type LLM"

Problem: "Provider 'vertex' not found"

Problem: All requests fail with 429

Problem: Wrong model being used

Testing Routes

Performance Considerations

Latency

Cost

Rate Limits

Advanced Patterns

Load Balancing

Cost vs. Quality Tiers

Model-Specific Routing

Next Steps