Monitoring Usage and Costs

This guide covers tracking API usage, understanding costs, and managing rate limits.

Real-Time Usage Tracking

Every API request is logged with:

Timestamp and request ID
Model and service used
Input/output token counts
Calculated costs
Success/failure status

Checking Usage via API

Current Period Usage

import requests

API_URL = "http://localhost:8000/api/v1"
API_KEY = "sk_your_api_key"

response = requests.get(
    f"{API_URL}/usage",
    headers={"X-API-Key": API_KEY}
)

usage = response.json()
print(f"Current Hour: ${usage['hour']['cost']:.4f}")
print(f"Current Day: ${usage['day']['cost']:.4f}")
print(f"Current Month: ${usage['month']['cost']:.4f}")
print(f"Total (All Time): ${usage['total']['cost']:.4f}")

Response Structure

{
  "api_key": "image-workflow-prod",
  "service": "google-gemini",
  "hour": {
    "requests": 42,
    "tokens": {"input": 15000, "output": 8000},
    "cost": 0.0135,
    "limit_warn": 2.0,
    "limit_block": 5.0,
    "percent_used": 0.27
  },
  "day": {
    "requests": 312,
    "tokens": {"input": 125000, "output": 62000},
    "cost": 0.1245,
    "limit_warn": 20.0,
    "limit_block": 50.0,
    "percent_used": 0.25
  },
  "month": {
    "requests": 4521,
    "tokens": {"input": 1800000, "output": 920000},
    "cost": 1.82,
    "limit_warn": 500.0,
    "limit_block": 1000.0,
    "percent_used": 0.18
  },
  "total": {
    "requests": 12450,
    "tokens": {"input": 5200000, "output": 2600000},
    "cost": 5.24
  }
}

Admin Dashboard Monitoring

The admin dashboard provides visual analytics:

Services Overview

Total requests per service
Cost breakdown by model
Error rates and trends
Active API keys count

Usage Analytics

Hourly/daily/monthly charts
Cost forecasting
Top consumers
Model popularity

Audit Logs

All API requests with details
Authentication events
Configuration changes
Rate limit events

Rate Limit Monitoring

Understanding Limit Levels

Level	Trigger	Behavior
Normal	< 80% of warn	Request proceeds
Warning	≥ 80% of warn	Request proceeds + warning header
Soft Limit	≥ warn, < block	Request proceeds + warning header
Hard Limit	≥ block	Request rejected (429)

Checking Limit Status

# Make a request and check headers
response = requests.post(
    f"{API_URL}/generate",
    headers={"X-API-Key": API_KEY},
    json={...}
)

# Check for warnings in response
result = response.json()
if result.get("warnings"):
    for warning in result["warnings"]:
        print(f"Warning: {warning}")
        # e.g., "Approaching hourly limit: 82% used ($4.10 of $5.00)"

# Check response headers
if "X-RateLimit-Warning" in response.headers:
    print(f"Rate limit warning: {response.headers['X-RateLimit-Warning']}")

Rate Limit Response

When a hard limit is hit:

{
  "status": "error",
  "error": {
    "code": "RATE_LIMIT_EXCEEDED",
    "message": "Hourly cost limit exceeded",
    "details": {
      "limit_type": "api_key",
      "period": "hourly",
      "current": 5.12,
      "limit": 5.0,
      "reset_at": "2025-01-15T15:00:00Z"
    }
  }
}

HTTP Status: 429 Too Many Requests

Cost Calculation

Costs are calculated per model based on token usage:

# Cost formula
input_cost = (input_tokens / 1_000_000) * cost_per_million_input
output_cost = (output_tokens / 1_000_000) * cost_per_million_output
total_cost = input_cost + output_cost

Model Pricing Reference (Google Gemini)

Model	Input (per 1M tokens)	Output (per 1M tokens)
gemini-2.0-flash	$0.075	$0.30
gemini-2.0-flash-lite	$0.0375	$0.15
gemini-3-pro-preview	$1.25	$10.00
gemini-3-pro-image	N/A	$0.039 per image

Getting Cost from Response

response = requests.post(f"{API_URL}/generate", ...)
result = response.json()

if result["status"] == "success":
    cost = result["cost"]
    print(f"Input cost: ${cost['input']:.6f}")
    print(f"Output cost: ${cost['output']:.6f}")
    print(f"Total cost: ${cost['total']:.6f}")

    usage = result["usage"]
    print(f"Tokens: {usage['input_tokens']} in, {usage['output_tokens']} out")

Admin API for Analytics

Get Service Usage (Admin)

# Detailed service analytics
stats = session.get(
    f"{ADMIN_URL}/services/{service_id}/stats",
    params={
        "period": "day",  # hour, day, month
        "start": "2025-01-01",
        "end": "2025-01-15"
    }
).json()

for day in stats["daily"]:
    print(f"{day['date']}: {day['requests']} requests, ${day['cost']:.2f}")

Get API Key Usage (Admin)

# Per-key analytics
key_stats = session.get(
    f"{ADMIN_URL}/api-keys/{key_id}/stats",
    params={"period": "month"}
).json()

print(f"Total requests: {key_stats['total_requests']}")
print(f"Total cost: ${key_stats['total_cost']:.2f}")
print(f"Most used model: {key_stats['top_model']}")

Export Usage Logs

# Export to CSV
export = session.get(
    f"{ADMIN_URL}/usage/export",
    params={
        "format": "csv",
        "start": "2025-01-01",
        "end": "2025-01-31",
        "service_id": service_id
    }
)

with open("usage_report.csv", "wb") as f:
    f.write(export.content)

Setting Up Alerts

Webhook Notifications

Configure webhooks in the admin dashboard to receive alerts:

{
  "webhook_url": "https://your-server.com/alerts",
  "events": [
    "limit_warning",
    "limit_exceeded",
    "service_error"
  ],
  "threshold_percent": 80
}

Alert Payload Example

{
  "event": "limit_warning",
  "timestamp": "2025-01-15T14:32:00Z",
  "api_key": "image-workflow-prod",
  "service": "google-gemini",
  "details": {
    "period": "hourly",
    "current": 4.10,
    "limit": 5.00,
    "percent": 82
  }
}

Best Practices

Set Conservative Limits - Start low and increase based on actual usage
Monitor Daily - Check the dashboard or usage API regularly
Use Warnings - Set warn thresholds at 60-80% of block thresholds
Separate Keys - Use different API keys for different applications
Export Regularly - Keep usage logs for accounting and analysis