Skip to content

Pricing Setup

Pricing tells Admin Bud-E how much to deduct from user credits for each request. You define costs per model in the same units the providers use:

  • LLM/VLM: Cost per 1,000,000 tokens (input and output separately)
  • TTS: Cost per character
  • ASR: Cost per hour of audio (fallback) or per token if reported

Why Pricing Matters

When a user makes a request:

  1. The middleware forwards it to a provider
  2. The provider returns usage metrics (tokens, characters, audio duration)
  3. Admin Bud-E multiplies usage by your pricing → credits to deduct
  4. Credits are subtracted from the user's balance

Without pricing entries:

  • No credits are deducted
  • Usage is tracked but not billed
  • Users can consume unlimited resources

DANGER

Always configure pricing before giving users access, or they'll use services for free.

Adding Pricing

  1. Navigate to Pricing in the Admin UI
  2. Click Add Pricing
  3. Fill in:
    • Model: Model identifier (must match route configuration)
    • Service Type: LLM, VLM, TTS, or ASR
    • Input Cost: Cost per unit (for LLM/VLM input tokens)
    • Output Cost: Cost per unit (for LLM/VLM output tokens)
    • Character Cost: Cost per character (for TTS)
    • Time Cost: Cost per hour (for ASR fallback)
  4. Click Save

LLM and VLM Pricing

Language and vision models charge separately for:

  • Input tokens (what you send to the model)
  • Output tokens (what the model generates)

Units: Cost per 1,000,000 tokens (1M tokens)

Example: Gemini 1.5 Flash (Vertex AI)

Check Google's pricing page:

  • Input: $0.075 per 1M tokens
  • Output: $0.30 per 1M tokens

Admin Bud-E configuration:

  • Model: gemini-1.5-flash
  • Service Type: LLM (or VLM for multimodal)
  • Input Cost: 0.075
  • Output Cost: 0.30

Example: Gemini 1.5 Pro (Vertex AI)

  • Input: $1.25 per 1M tokens (≤128K context)
  • Output: $5.00 per 1M tokens (≤128K context)

Admin Bud-E configuration:

  • Model: gemini-1.5-pro
  • Service Type: LLM or VLM
  • Input Cost: 1.25
  • Output Cost: 5.00

INFO

Prices vary by context length. For simplicity, use the base tier pricing and monitor usage.

Example: Together Llama 3 70B

Check Together's pricing:

  • Input: $0.90 per 1M tokens
  • Output: $0.90 per 1M tokens

Admin Bud-E configuration:

  • Model: meta-llama/Llama-3-70b-chat-hf
  • Service Type: LLM
  • Input Cost: 0.90
  • Output Cost: 0.90

Example: Mistral Large

Check Mistral's pricing:

  • Input: $2.00 per 1M tokens
  • Output: $6.00 per 1M tokens

Admin Bud-E configuration:

  • Model: mistral-large-latest
  • Service Type: LLM
  • Input Cost: 2.00
  • Output Cost: 6.00

TTS (Text-to-Speech) Pricing

Text-to-speech charges per character sent to the API.

Units: Cost per character

Example: Google Cloud TTS

Check Cloud TTS pricing:

  • Standard voices: $4.00 per 1M characters = $0.000004 per character
  • Neural2 voices: $16.00 per 1M characters = $0.000016 per character

Admin Bud-E configuration:

  • Model: en-US-Neural2-C
  • Service Type: TTS
  • Character Cost: 0.000016

TIP

Pricing per character is very small. Use decimal notation like 0.000016 or scientific notation if supported.

Example: Other TTS Providers

Check the provider's pricing page and convert to cost-per-character:

If priced per 1M characters:

Cost per character = (Price per 1M characters) / 1,000,000

If priced per 1K characters:

Cost per character = (Price per 1K characters) / 1,000

ASR (Speech-to-Text) Pricing

Speech recognition can be priced two ways:

  1. Token-based (if provider reports token usage)
  2. Time-based (per hour or minute of audio — used as fallback)

Token-Based ASR

If your provider returns token counts for transcriptions, use token pricing like LLM:

Admin Bud-E configuration:

  • Model: whisper-large-v3
  • Service Type: ASR
  • Input Cost: Cost per 1M tokens
  • Output Cost: 0 (usually no output tokens for ASR)

Time-Based ASR (Fallback)

If token usage isn't reported, Admin Bud-E calculates cost based on audio duration.

Units: Cost per hour of audio

Example: Google Cloud Speech-to-Text

Check Cloud STT pricing:

  • Standard: $1.44 per hour = $0.024 per minute

Admin Bud-E configuration:

  • Model: default
  • Service Type: ASR
  • Time Cost: 1.44 (per hour)

INFO

Admin Bud-E internally tracks audio duration in seconds, then converts to hours for billing.

Example: Other ASR Providers

If priced per minute:

Cost per hour = (Price per minute) × 60

If priced per second:

Cost per hour = (Price per second) × 3600

Credit Calculations

Example 1: LLM Request

Model: Gemini 1.5 Flash Pricing:

  • Input: $0.075 per 1M tokens
  • Output: $0.30 per 1M tokens

Usage:

  • Input: 1,500 tokens
  • Output: 500 tokens

Calculation:

Input cost  = (1,500 / 1,000,000) × 0.075 = 0.0001125 credits
Output cost = (500 / 1,000,000) × 0.30   = 0.00015 credits
Total       = 0.0001125 + 0.00015        = 0.0002625 credits

Result: 0.0002625 credits deducted from user balance.

Example 2: TTS Request

Model: en-US-Neural2-C Pricing: $0.000016 per character

Usage: 250 characters

Calculation:

Cost = 250 × 0.000016 = 0.004 credits

Result: 0.004 credits deducted.

Example 3: ASR Request

Model: Google STT default Pricing: $1.44 per hour

Usage: 45 seconds of audio

Calculation:

Hours = 45 / 3600 = 0.0125 hours
Cost  = 0.0125 × 1.44 = 0.018 credits

Result: 0.018 credits deducted.

Markup and Margins

You may want to charge users more than the raw provider cost to:

  • Cover operational expenses (server, bandwidth)
  • Build a reserve fund
  • Provide admin overhead budget

How to apply markup:

Multiply provider costs by your markup factor.

Example: 20% markup

Provider cost: $0.075 per 1M input tokens Your cost: $0.075 × 1.2 = $0.09 per 1M input tokens

Example: 50% markup

Provider cost: $0.30 per 1M output tokens Your cost: $0.30 × 1.5 = $0.45 per 1M output tokens

Example: 2× markup (double)

Provider cost: $1.44 per hour ASR Your cost: $1.44 × 2 = $2.88 per hour

TIP

For schools and non-profits, a small markup (10-20%) is common. For commercial use, 50-200% is typical.

Editing Pricing

To update pricing:

  1. Navigate to Pricing
  2. Click Edit on the pricing entry
  3. Update costs
  4. Click Save

Effect:

  • New pricing applies to future requests only
  • Past usage/credits are not recalculated
  • Usage reports show historical costs

Deleting Pricing

To remove a pricing entry:

  1. Navigate to Pricing
  2. Click Delete
  3. Confirm deletion

WARNING

If you delete pricing for an active model:

  • Requests to that model will not deduct credits
  • Usage is tracked but not billed
  • Users get "free" usage

Rounding and Precision

Admin Bud-E tracks credits with high precision (typically 8+ decimal places internally). In the UI and reports, values may be rounded for readability.

Example:

  • Actual: 0.00026253 credits
  • Displayed: 0.000263 credits (rounded)
  • Storage: Full precision maintained

This prevents rounding errors from accumulating over thousands of requests.

Default Pricing

If a request uses a model with no pricing entry:

Behavior:

  • Request proceeds normally
  • Usage metrics are logged
  • Zero credits are deducted

This can be useful for:

  • Testing new models
  • Free trial periods
  • Internal/admin usage

DANGER

Don't rely on missing pricing for access control. If a model should be unavailable, remove the route or disable the provider.

Bulk Pricing Setup

For multiple models with similar pricing:

  1. Add pricing for one model
  2. Note the values
  3. Duplicate for other models, adjusting as needed

Example: All Gemini models

Copy pricing from gemini-1.5-flash to:

  • gemini-1.5-pro (adjust costs)
  • gemini-1.0-pro (adjust costs)
  • etc.

Testing Pricing

After configuring pricing:

  1. Make a small test request
  2. Check Usage page
  3. Verify:
    • Credits deducted correctly
    • Matches expected calculation
    • No errors in logs

Example test:

  • Send a short LLM prompt (~100 tokens)
  • Expected cost: ~0.0001 credits
  • Check user's credit balance before/after

Common Mistakes

Mistake 1: Wrong Units

Problem: Entered $0.075 per token instead of per 1M tokens.

Result: Users charged 1 million times too much.

Solution: Always use cost per 1,000,000 tokens for LLM/VLM.

Mistake 2: Swapped Input/Output

Problem: Put output cost in input field and vice versa.

Result: Wrong credits deducted (usually overcharge on long prompts).

Solution: Double-check which is which. Input = user's prompt, Output = model's response.

Mistake 3: Forgot to Add Pricing

Problem: Configured routes but forgot pricing.

Result: Users get free usage.

Solution: Always add pricing before activating routes.

Mistake 4: Model Name Mismatch

Problem: Pricing uses gemini-flash but route uses gemini-1.5-flash.

Result: No pricing found → zero credits deducted.

Solution: Model name in pricing must exactly match model name in route (case-sensitive).

Next Steps