Pricing Setup
Pricing tells Admin Bud-E how much to deduct from user credits for each request. You define costs per model in the same units the providers use:
- LLM/VLM: Cost per 1,000,000 tokens (input and output separately)
- TTS: Cost per character
- ASR: Cost per hour of audio (fallback) or per token if reported
Why Pricing Matters
When a user makes a request:
- The middleware forwards it to a provider
- The provider returns usage metrics (tokens, characters, audio duration)
- Admin Bud-E multiplies usage by your pricing → credits to deduct
- Credits are subtracted from the user's balance
Without pricing entries:
- No credits are deducted
- Usage is tracked but not billed
- Users can consume unlimited resources
DANGER
Always configure pricing before giving users access, or they'll use services for free.
Adding Pricing
- Navigate to Pricing in the Admin UI
- Click Add Pricing
- Fill in:
- Model: Model identifier (must match route configuration)
- Service Type: LLM, VLM, TTS, or ASR
- Input Cost: Cost per unit (for LLM/VLM input tokens)
- Output Cost: Cost per unit (for LLM/VLM output tokens)
- Character Cost: Cost per character (for TTS)
- Time Cost: Cost per hour (for ASR fallback)
- Click Save
LLM and VLM Pricing
Language and vision models charge separately for:
- Input tokens (what you send to the model)
- Output tokens (what the model generates)
Units: Cost per 1,000,000 tokens (1M tokens)
Example: Gemini 1.5 Flash (Vertex AI)
Check Google's pricing page:
- Input: $0.075 per 1M tokens
- Output: $0.30 per 1M tokens
Admin Bud-E configuration:
- Model:
gemini-1.5-flash - Service Type: LLM (or VLM for multimodal)
- Input Cost:
0.075 - Output Cost:
0.30
Example: Gemini 1.5 Pro (Vertex AI)
- Input: $1.25 per 1M tokens (≤128K context)
- Output: $5.00 per 1M tokens (≤128K context)
Admin Bud-E configuration:
- Model:
gemini-1.5-pro - Service Type: LLM or VLM
- Input Cost:
1.25 - Output Cost:
5.00
INFO
Prices vary by context length. For simplicity, use the base tier pricing and monitor usage.
Example: Together Llama 3 70B
Check Together's pricing:
- Input: $0.90 per 1M tokens
- Output: $0.90 per 1M tokens
Admin Bud-E configuration:
- Model:
meta-llama/Llama-3-70b-chat-hf - Service Type: LLM
- Input Cost:
0.90 - Output Cost:
0.90
Example: Mistral Large
Check Mistral's pricing:
- Input: $2.00 per 1M tokens
- Output: $6.00 per 1M tokens
Admin Bud-E configuration:
- Model:
mistral-large-latest - Service Type: LLM
- Input Cost:
2.00 - Output Cost:
6.00
TTS (Text-to-Speech) Pricing
Text-to-speech charges per character sent to the API.
Units: Cost per character
Example: Google Cloud TTS
Check Cloud TTS pricing:
- Standard voices: $4.00 per 1M characters = $0.000004 per character
- Neural2 voices: $16.00 per 1M characters = $0.000016 per character
Admin Bud-E configuration:
- Model:
en-US-Neural2-C - Service Type: TTS
- Character Cost:
0.000016
TIP
Pricing per character is very small. Use decimal notation like 0.000016 or scientific notation if supported.
Example: Other TTS Providers
Check the provider's pricing page and convert to cost-per-character:
If priced per 1M characters:
Cost per character = (Price per 1M characters) / 1,000,000If priced per 1K characters:
Cost per character = (Price per 1K characters) / 1,000ASR (Speech-to-Text) Pricing
Speech recognition can be priced two ways:
- Token-based (if provider reports token usage)
- Time-based (per hour or minute of audio — used as fallback)
Token-Based ASR
If your provider returns token counts for transcriptions, use token pricing like LLM:
Admin Bud-E configuration:
- Model:
whisper-large-v3 - Service Type: ASR
- Input Cost: Cost per 1M tokens
- Output Cost: 0 (usually no output tokens for ASR)
Time-Based ASR (Fallback)
If token usage isn't reported, Admin Bud-E calculates cost based on audio duration.
Units: Cost per hour of audio
Example: Google Cloud Speech-to-Text
Check Cloud STT pricing:
- Standard: $1.44 per hour = $0.024 per minute
Admin Bud-E configuration:
- Model:
default - Service Type: ASR
- Time Cost:
1.44(per hour)
INFO
Admin Bud-E internally tracks audio duration in seconds, then converts to hours for billing.
Example: Other ASR Providers
If priced per minute:
Cost per hour = (Price per minute) × 60If priced per second:
Cost per hour = (Price per second) × 3600Credit Calculations
Example 1: LLM Request
Model: Gemini 1.5 Flash Pricing:
- Input: $0.075 per 1M tokens
- Output: $0.30 per 1M tokens
Usage:
- Input: 1,500 tokens
- Output: 500 tokens
Calculation:
Input cost = (1,500 / 1,000,000) × 0.075 = 0.0001125 credits
Output cost = (500 / 1,000,000) × 0.30 = 0.00015 credits
Total = 0.0001125 + 0.00015 = 0.0002625 creditsResult: 0.0002625 credits deducted from user balance.
Example 2: TTS Request
Model: en-US-Neural2-C Pricing: $0.000016 per character
Usage: 250 characters
Calculation:
Cost = 250 × 0.000016 = 0.004 creditsResult: 0.004 credits deducted.
Example 3: ASR Request
Model: Google STT default Pricing: $1.44 per hour
Usage: 45 seconds of audio
Calculation:
Hours = 45 / 3600 = 0.0125 hours
Cost = 0.0125 × 1.44 = 0.018 creditsResult: 0.018 credits deducted.
Markup and Margins
You may want to charge users more than the raw provider cost to:
- Cover operational expenses (server, bandwidth)
- Build a reserve fund
- Provide admin overhead budget
How to apply markup:
Multiply provider costs by your markup factor.
Example: 20% markup
Provider cost: $0.075 per 1M input tokens Your cost: $0.075 × 1.2 = $0.09 per 1M input tokens
Example: 50% markup
Provider cost: $0.30 per 1M output tokens Your cost: $0.30 × 1.5 = $0.45 per 1M output tokens
Example: 2× markup (double)
Provider cost: $1.44 per hour ASR Your cost: $1.44 × 2 = $2.88 per hour
TIP
For schools and non-profits, a small markup (10-20%) is common. For commercial use, 50-200% is typical.
Editing Pricing
To update pricing:
- Navigate to Pricing
- Click Edit on the pricing entry
- Update costs
- Click Save
Effect:
- New pricing applies to future requests only
- Past usage/credits are not recalculated
- Usage reports show historical costs
Deleting Pricing
To remove a pricing entry:
- Navigate to Pricing
- Click Delete
- Confirm deletion
WARNING
If you delete pricing for an active model:
- Requests to that model will not deduct credits
- Usage is tracked but not billed
- Users get "free" usage
Rounding and Precision
Admin Bud-E tracks credits with high precision (typically 8+ decimal places internally). In the UI and reports, values may be rounded for readability.
Example:
- Actual: 0.00026253 credits
- Displayed: 0.000263 credits (rounded)
- Storage: Full precision maintained
This prevents rounding errors from accumulating over thousands of requests.
Default Pricing
If a request uses a model with no pricing entry:
Behavior:
- Request proceeds normally
- Usage metrics are logged
- Zero credits are deducted
This can be useful for:
- Testing new models
- Free trial periods
- Internal/admin usage
DANGER
Don't rely on missing pricing for access control. If a model should be unavailable, remove the route or disable the provider.
Bulk Pricing Setup
For multiple models with similar pricing:
- Add pricing for one model
- Note the values
- Duplicate for other models, adjusting as needed
Example: All Gemini models
Copy pricing from gemini-1.5-flash to:
gemini-1.5-pro(adjust costs)gemini-1.0-pro(adjust costs)- etc.
Testing Pricing
After configuring pricing:
- Make a small test request
- Check Usage page
- Verify:
- Credits deducted correctly
- Matches expected calculation
- No errors in logs
Example test:
- Send a short LLM prompt (~100 tokens)
- Expected cost: ~0.0001 credits
- Check user's credit balance before/after
Common Mistakes
Mistake 1: Wrong Units
Problem: Entered $0.075 per token instead of per 1M tokens.
Result: Users charged 1 million times too much.
Solution: Always use cost per 1,000,000 tokens for LLM/VLM.
Mistake 2: Swapped Input/Output
Problem: Put output cost in input field and vice versa.
Result: Wrong credits deducted (usually overcharge on long prompts).
Solution: Double-check which is which. Input = user's prompt, Output = model's response.
Mistake 3: Forgot to Add Pricing
Problem: Configured routes but forgot pricing.
Result: Users get free usage.
Solution: Always add pricing before activating routes.
Mistake 4: Model Name Mismatch
Problem: Pricing uses gemini-flash but route uses gemini-1.5-flash.
Result: No pricing found → zero credits deducted.
Solution: Model name in pricing must exactly match model name in route (case-sensitive).