Pricing Setup

Pricing tells Admin Bud-E how much to deduct from user credits for each request. You define costs per model in the same units the providers use:

LLM/VLM: Cost per 1,000,000 tokens (input and output separately)
TTS: Cost per character
ASR: Cost per hour of audio (fallback) or per token if reported

Why Pricing Matters

When a user makes a request:

The middleware forwards it to a provider
The provider returns usage metrics (tokens, characters, audio duration)
Admin Bud-E multiplies usage by your pricing → credits to deduct
Credits are subtracted from the user's balance

Without pricing entries:

No credits are deducted
Usage is tracked but not billed
Users can consume unlimited resources

DANGER

Always configure pricing before giving users access, or they'll use services for free.

Adding Pricing

Navigate to Pricing in the Admin UI
Click Add Pricing
Fill in:
- Model: Model identifier (must match route configuration)
- Service Type: LLM, VLM, TTS, or ASR
- Input Cost: Cost per unit (for LLM/VLM input tokens)
- Output Cost: Cost per unit (for LLM/VLM output tokens)
- Character Cost: Cost per character (for TTS)
- Time Cost: Cost per hour (for ASR fallback)
Click Save

LLM and VLM Pricing

Language and vision models charge separately for:

Input tokens (what you send to the model)
Output tokens (what the model generates)

Units: Cost per 1,000,000 tokens (1M tokens)

Example: Gemini 1.5 Flash (Vertex AI)

Check Google's pricing page:

Input: $0.075 per 1M tokens
Output: $0.30 per 1M tokens

Admin Bud-E configuration:

Model: gemini-1.5-flash
Service Type: LLM (or VLM for multimodal)
Input Cost: 0.075
Output Cost: 0.30

Example: Gemini 1.5 Pro (Vertex AI)

Input: $1.25 per 1M tokens (≤128K context)
Output: $5.00 per 1M tokens (≤128K context)

Admin Bud-E configuration:

Model: gemini-1.5-pro
Service Type: LLM or VLM
Input Cost: 1.25
Output Cost: 5.00

INFO

Prices vary by context length. For simplicity, use the base tier pricing and monitor usage.

Example: Together Llama 3 70B

Check Together's pricing:

Input: $0.90 per 1M tokens
Output: $0.90 per 1M tokens

Admin Bud-E configuration:

Model: meta-llama/Llama-3-70b-chat-hf
Service Type: LLM
Input Cost: 0.90
Output Cost: 0.90

Example: Mistral Large

Check Mistral's pricing:

Input: $2.00 per 1M tokens
Output: $6.00 per 1M tokens

Admin Bud-E configuration:

Model: mistral-large-latest
Service Type: LLM
Input Cost: 2.00
Output Cost: 6.00

TTS (Text-to-Speech) Pricing

Text-to-speech charges per character sent to the API.

Units: Cost per character

Example: Google Cloud TTS

Check Cloud TTS pricing:

Standard voices: $4.00 per 1M characters = $0.000004 per character
Neural2 voices: $16.00 per 1M characters = $0.000016 per character

Admin Bud-E configuration:

Model: en-US-Neural2-C
Service Type: TTS
Character Cost: 0.000016

TIP

Pricing per character is very small. Use decimal notation like 0.000016 or scientific notation if supported.

Example: Other TTS Providers

Check the provider's pricing page and convert to cost-per-character:

If priced per 1M characters:

Cost per character = (Price per 1M characters) / 1,000,000

If priced per 1K characters:

Cost per character = (Price per 1K characters) / 1,000

ASR (Speech-to-Text) Pricing

Speech recognition can be priced two ways:

Token-based (if provider reports token usage)
Time-based (per hour or minute of audio — used as fallback)

Token-Based ASR

If your provider returns token counts for transcriptions, use token pricing like LLM:

Admin Bud-E configuration:

Model: whisper-large-v3
Service Type: ASR
Input Cost: Cost per 1M tokens
Output Cost: 0 (usually no output tokens for ASR)

Time-Based ASR (Fallback)

If token usage isn't reported, Admin Bud-E calculates cost based on audio duration.

Units: Cost per hour of audio

Example: Google Cloud Speech-to-Text

Check Cloud STT pricing:

Standard: $1.44 per hour = $0.024 per minute

Admin Bud-E configuration:

Model: default
Service Type: ASR
Time Cost: 1.44 (per hour)

INFO

Admin Bud-E internally tracks audio duration in seconds, then converts to hours for billing.

Example: Other ASR Providers

If priced per minute:

Cost per hour = (Price per minute) × 60

If priced per second:

Cost per hour = (Price per second) × 3600

Credit Calculations

Example 1: LLM Request

Model: Gemini 1.5 Flash Pricing:

Input: $0.075 per 1M tokens
Output: $0.30 per 1M tokens

Usage:

Input: 1,500 tokens
Output: 500 tokens

Calculation:

Input cost  = (1,500 / 1,000,000) × 0.075 = 0.0001125 credits
Output cost = (500 / 1,000,000) × 0.30   = 0.00015 credits
Total       = 0.0001125 + 0.00015        = 0.0002625 credits

Result: 0.0002625 credits deducted from user balance.

Example 2: TTS Request

Model: en-US-Neural2-C Pricing: $0.000016 per character

Usage: 250 characters

Calculation:

Cost = 250 × 0.000016 = 0.004 credits

Result: 0.004 credits deducted.

Example 3: ASR Request

Model: Google STT default Pricing: $1.44 per hour

Usage: 45 seconds of audio

Calculation:

Hours = 45 / 3600 = 0.0125 hours
Cost  = 0.0125 × 1.44 = 0.018 credits

Result: 0.018 credits deducted.

Markup and Margins

You may want to charge users more than the raw provider cost to:

Cover operational expenses (server, bandwidth)
Build a reserve fund
Provide admin overhead budget

How to apply markup:

Multiply provider costs by your markup factor.

Example: 20% markup

Provider cost: $0.075 per 1M input tokens Your cost: $0.075 × 1.2 = $0.09 per 1M input tokens

Example: 50% markup

Provider cost: $0.30 per 1M output tokens Your cost: $0.30 × 1.5 = $0.45 per 1M output tokens

Example: 2× markup (double)

Provider cost: $1.44 per hour ASR Your cost: $1.44 × 2 = $2.88 per hour

TIP

For schools and non-profits, a small markup (10-20%) is common. For commercial use, 50-200% is typical.

Editing Pricing

To update pricing:

Navigate to Pricing
Click Edit on the pricing entry
Update costs
Click Save

Effect:

New pricing applies to future requests only
Past usage/credits are not recalculated
Usage reports show historical costs

Deleting Pricing

To remove a pricing entry:

Navigate to Pricing
Click Delete
Confirm deletion

WARNING

If you delete pricing for an active model:

Requests to that model will not deduct credits
Usage is tracked but not billed
Users get "free" usage

Rounding and Precision

Admin Bud-E tracks credits with high precision (typically 8+ decimal places internally). In the UI and reports, values may be rounded for readability.

Example:

Actual: 0.00026253 credits
Displayed: 0.000263 credits (rounded)
Storage: Full precision maintained

This prevents rounding errors from accumulating over thousands of requests.

Default Pricing

If a request uses a model with no pricing entry:

Behavior:

Request proceeds normally
Usage metrics are logged
Zero credits are deducted

This can be useful for:

Testing new models
Free trial periods
Internal/admin usage

DANGER

Don't rely on missing pricing for access control. If a model should be unavailable, remove the route or disable the provider.

Bulk Pricing Setup

For multiple models with similar pricing:

Add pricing for one model
Note the values
Duplicate for other models, adjusting as needed

Example: All Gemini models

Copy pricing from gemini-1.5-flash to:

gemini-1.5-pro (adjust costs)
gemini-1.0-pro (adjust costs)
etc.

Testing Pricing

After configuring pricing:

Make a small test request
Check Usage page
Verify:
- Credits deducted correctly
- Matches expected calculation
- No errors in logs

Example test:

Send a short LLM prompt (~100 tokens)
Expected cost: ~0.0001 credits
Check user's credit balance before/after

Common Mistakes

Mistake 1: Wrong Units

Problem: Entered $0.075 per token instead of per 1M tokens.

Result: Users charged 1 million times too much.

Solution: Always use cost per 1,000,000 tokens for LLM/VLM.

Mistake 2: Swapped Input/Output

Problem: Put output cost in input field and vice versa.

Result: Wrong credits deducted (usually overcharge on long prompts).

Solution: Double-check which is which. Input = user's prompt, Output = model's response.

Mistake 3: Forgot to Add Pricing

Problem: Configured routes but forgot pricing.

Result: Users get free usage.

Solution: Always add pricing before activating routes.

Mistake 4: Model Name Mismatch

Problem: Pricing uses gemini-flash but route uses gemini-1.5-flash.

Result: No pricing found → zero credits deducted.

Solution: Model name in pricing must exactly match model name in route (case-sensitive).

Pricing Setup ​

Why Pricing Matters ​

Adding Pricing ​

LLM and VLM Pricing ​

Example: Gemini 1.5 Flash (Vertex AI) ​

Example: Gemini 1.5 Pro (Vertex AI) ​

Example: Together Llama 3 70B ​

Example: Mistral Large ​

TTS (Text-to-Speech) Pricing ​

Example: Google Cloud TTS ​

Example: Other TTS Providers ​

ASR (Speech-to-Text) Pricing ​

Token-Based ASR ​

Time-Based ASR (Fallback) ​

Example: Google Cloud Speech-to-Text ​

Example: Other ASR Providers ​

Credit Calculations ​

Example 1: LLM Request ​

Example 2: TTS Request ​

Example 3: ASR Request ​

Markup and Margins ​

Editing Pricing ​

Deleting Pricing ​

Rounding and Precision ​

Default Pricing ​

Bulk Pricing Setup ​

Testing Pricing ​

Common Mistakes ​

Mistake 1: Wrong Units ​

Mistake 2: Swapped Input/Output ​

Mistake 3: Forgot to Add Pricing ​

Mistake 4: Model Name Mismatch ​

Next Steps ​

Pricing Setup

Why Pricing Matters

Adding Pricing

LLM and VLM Pricing

Example: Gemini 1.5 Flash (Vertex AI)

Example: Gemini 1.5 Pro (Vertex AI)

Example: Together Llama 3 70B

Example: Mistral Large

TTS (Text-to-Speech) Pricing

Example: Google Cloud TTS

Example: Other TTS Providers

ASR (Speech-to-Text) Pricing

Token-Based ASR

Time-Based ASR (Fallback)

Example: Google Cloud Speech-to-Text

Example: Other ASR Providers

Credit Calculations

Example 1: LLM Request

Example 2: TTS Request

Example 3: ASR Request

Markup and Margins

Editing Pricing

Deleting Pricing

Rounding and Precision

Default Pricing

Bulk Pricing Setup

Testing Pricing

Common Mistakes

Mistake 1: Wrong Units

Mistake 2: Swapped Input/Output

Mistake 3: Forgot to Add Pricing

Mistake 4: Model Name Mismatch

Next Steps