Route Configuration
Routes define which provider handles which type of request. Admin Bud-E supports four service types:
- LLM — Large Language Models (text generation)
- VLM — Vision-Language Models (text + image understanding)
- TTS — Text-to-Speech (audio generation)
- ASR — Automatic Speech Recognition (audio transcription)
Each route specifies:
- Service type (LLM/VLM/TTS/ASR)
- Provider (which provider to use)
- Model (specific model identifier)
- Priority (for failover: lower numbers = higher priority)
How Routes Work
When a request comes in:
- Admin Bud-E identifies the service type (LLM, VLM, TTS, or ASR)
- Finds all routes for that type, sorted by priority (ascending)
- Tries the highest priority route (lowest number) first
- If that fails with a retryable error (429, 5xx), tries the next priority
- Continues until success or all routes exhausted
This gives you automatic failover without client-side changes.
Adding a Route
- Navigate to Routes in the Admin UI
- Click Add Route
- Fill in:
- Type: Select service type (LLM/VLM/TTS/ASR)
- Provider: Select from your configured providers
- Model: Model identifier (provider-specific)
- Priority: Number (1 = highest, 10 = lowest)
- Click Save
Priority System
Priority determines the order routes are tried.
Rules:
- Lower numbers = higher priority = tried first
- If multiple routes have the same priority, order is undefined
- Gaps are allowed (you can use priorities 1, 5, 10)
Common patterns:
Single Provider (No Failover)
LLM Routes:
- Route 1: Vertex Gemini, priority 1
All LLM requests go to Vertex. If it fails, the request fails.
Two Providers (Simple Failover)
LLM Routes:
- Route 1: Vertex Gemini, priority 1
- Route 2: Together Llama, priority 2
Tries Vertex first. On rate-limit or server error, falls back to Together.
Three Providers (Full Redundancy)
LLM Routes:
- Route 1: Vertex Gemini, priority 1
- Route 2: Together Llama, priority 2
- Route 3: Mistral Large, priority 3
Maximum availability. Request succeeds unless all three fail.
Failover Behavior
Retryable Errors
Admin Bud-E automatically retries the next route when it sees:
- 429 Too Many Requests (rate limit)
- 5xx Server Errors (503, 502, 500, etc.)
- Network timeouts
Non-Retryable Errors
These errors do not trigger failover:
- 400 Bad Request (malformed input)
- 401 Unauthorized (invalid API key)
- 404 Not Found (model doesn't exist)
- 4xx Client Errors (except 429)
The request fails immediately with the error message.
TIP
Non-retryable errors usually indicate configuration problems. Check your provider settings and model names.
Model Identifiers
Model names are provider-specific. Check each provider's documentation:
Vertex AI (Google)
LLM/VLM:
gemini-1.5-flash(fast, cost-effective)gemini-1.5-pro(most capable)gemini-1.0-pro(legacy)
TTS:
en-US-Neural2-C(female voice)en-US-Neural2-D(male voice)- See Cloud TTS voices
ASR:
default(standard model)latest_long(optimized for longer audio)
Together AI
Example models:
meta-llama/Llama-3-70b-chat-hfmistralai/Mixtral-8x7B-Instruct-v0.1togethercomputer/CodeLlama-34b-Instruct
Mistral AI
Models:
mistral-large-latest(most capable)mistral-medium-latest(balanced)mistral-small-latest(fast, efficient)
OpenAI
Models:
gpt-4-turbogpt-4gpt-3.5-turbo
Route Examples
Example 1: Vertex-Only Setup
Goal: Use only Google Vertex AI for everything.
Routes:
- LLM: Vertex,
gemini-1.5-flash, priority 1 - VLM: Vertex,
gemini-1.5-pro, priority 1 - TTS: Vertex,
en-US-Neural2-C, priority 1 - ASR: Vertex,
default, priority 1
Example 2: Cost-Optimized with Failover
Goal: Use cheapest provider first, fall back to premium on rate limits.
LLM Routes:
- Together,
meta-llama/Llama-3-70b-chat-hf, priority 1 (cheap, fast) - Vertex,
gemini-1.5-flash, priority 2 (fallback) - Mistral,
mistral-large-latest, priority 3 (last resort)
VLM Routes:
- Vertex,
gemini-1.5-flash, priority 1 (multimodal)
Example 3: Geographic Redundancy
Goal: Try EU provider first, fall back to US if needed.
LLM Routes:
- Mistral,
mistral-large-latest, priority 1 (EU) - Vertex EU,
gemini-1.5-pro, priority 2 (EU) - Together,
meta-llama/Llama-3-70b-chat-hf, priority 3 (may be US)
Editing Routes
To change an existing route:
- Navigate to Routes
- Click Edit on the route
- Update fields
- Click Save
Changes take effect immediately.
Deleting Routes
To remove a route:
- Navigate to Routes
- Click Delete on the route
- Confirm deletion
WARNING
If you delete all routes for a service type (e.g., all LLM routes), requests of that type will fail.
Debugging Routes
Problem: "No route found for service type LLM"
Cause: No LLM routes exist.
Solution: Add at least one LLM route.
Problem: "Provider 'vertex' not found"
Cause: Route references a provider that doesn't exist or has a typo.
Solution:
- Check provider name exactly matches (case-sensitive)
- For Vertex, ensure name is exactly
vertex(lowercase) - Verify provider exists in Providers page
Problem: All requests fail with 429
Cause: Rate limit on primary provider, no failover configured.
Solution: Add a second provider with priority 2.
Problem: Wrong model being used
Cause: Route priority is incorrect.
Solution: Check priorities — lower number = higher priority.
Testing Routes
After configuring routes:
- Make a test request from the frontend
- Check Usage → detailed logs
- Verify:
- Correct provider was used
- Correct model was invoked
- Credits match expected pricing
If failover occurred, you'll see multiple entries (one per attempt).
Performance Considerations
Latency
Each failover attempt adds latency:
- Primary failure: ~5-30 seconds (depends on timeout)
- Secondary attempt: additional ~5-30 seconds
Best practices:
- Use fast providers as primary
- Set reasonable timeouts
- Monitor failure rates
Cost
Failover attempts may be billed by the provider even if they fail:
- 429 errors usually don't incur charges
- Partial responses (timeouts) might incur charges
- Check provider billing for details
Rate Limits
If you're hitting rate limits often:
- Request quota increases from provider
- Add more providers to spread load
- Implement client-side rate limiting
- Use slower/cheaper models with higher quotas
Advanced Patterns
Load Balancing
For high traffic, alternate between providers:
LLM Routes:
- Vertex,
gemini-1.5-flash, priority 1 - Together,
meta-llama/Llama-3-70b-chat-hf, priority 1
Both have same priority → middleware picks one (implementation-defined, typically round-robin or random).
WARNING
Load balancing behavior with equal priorities is not guaranteed. For explicit control, use different priorities.
Cost vs. Quality Tiers
LLM Routes:
- Together,
meta-llama/Llama-3-8b-chat-hf, priority 1 (cheap, fast, decent) - Vertex,
gemini-1.5-flash, priority 2 (more capable, more expensive) - Vertex,
gemini-1.5-pro, priority 3 (best quality, most expensive)
Most requests use the cheap model. On rate-limits or errors, step up to better models.
Model-Specific Routing
Different models for different use cases:
Text-only (LLM):
- Vertex,
gemini-1.5-flash, priority 1
Multimodal/Images (VLM):
- Vertex,
gemini-1.5-pro, priority 1 (better vision understanding)
Long context:
- Together,
togethercomputer/CodeLlama-34b-Instruct, priority 1