Monitoring Usage and Costs

This guide covers tracking API usage, understanding costs, and managing rate limits.

Real-Time Usage Tracking

Every API request is logged with:

  • Timestamp and request ID
  • Model and service used
  • Input/output token counts
  • Calculated costs
  • Success/failure status

Checking Usage via API

Current Period Usage

import requests

API_URL = "http://localhost:8000/api/v1"
API_KEY = "sk_your_api_key"

response = requests.get(
    f"{API_URL}/usage",
    headers={"X-API-Key": API_KEY}
)

usage = response.json()
print(f"Current Hour: ${usage['hour']['cost']:.4f}")
print(f"Current Day: ${usage['day']['cost']:.4f}")
print(f"Current Month: ${usage['month']['cost']:.4f}")
print(f"Total (All Time): ${usage['total']['cost']:.4f}")

Response Structure

{
  "api_key": "image-workflow-prod",
  "service": "google-gemini",
  "hour": {
    "requests": 42,
    "tokens": {"input": 15000, "output": 8000},
    "cost": 0.0135,
    "limit_warn": 2.0,
    "limit_block": 5.0,
    "percent_used": 0.27
  },
  "day": {
    "requests": 312,
    "tokens": {"input": 125000, "output": 62000},
    "cost": 0.1245,
    "limit_warn": 20.0,
    "limit_block": 50.0,
    "percent_used": 0.25
  },
  "month": {
    "requests": 4521,
    "tokens": {"input": 1800000, "output": 920000},
    "cost": 1.82,
    "limit_warn": 500.0,
    "limit_block": 1000.0,
    "percent_used": 0.18
  },
  "total": {
    "requests": 12450,
    "tokens": {"input": 5200000, "output": 2600000},
    "cost": 5.24
  }
}

Admin Dashboard Monitoring

The admin dashboard provides visual analytics:

Services Overview

  • Total requests per service
  • Cost breakdown by model
  • Error rates and trends
  • Active API keys count

Usage Analytics

  • Hourly/daily/monthly charts
  • Cost forecasting
  • Top consumers
  • Model popularity

Audit Logs

  • All API requests with details
  • Authentication events
  • Configuration changes
  • Rate limit events

Rate Limit Monitoring

Understanding Limit Levels

Level Trigger Behavior
Normal < 80% of warn Request proceeds
Warning ≥ 80% of warn Request proceeds + warning header
Soft Limit ≥ warn, < block Request proceeds + warning header
Hard Limit ≥ block Request rejected (429)

Checking Limit Status

# Make a request and check headers
response = requests.post(
    f"{API_URL}/generate",
    headers={"X-API-Key": API_KEY},
    json={...}
)

# Check for warnings in response
result = response.json()
if result.get("warnings"):
    for warning in result["warnings"]:
        print(f"Warning: {warning}")
        # e.g., "Approaching hourly limit: 82% used ($4.10 of $5.00)"

# Check response headers
if "X-RateLimit-Warning" in response.headers:
    print(f"Rate limit warning: {response.headers['X-RateLimit-Warning']}")

Rate Limit Response

When a hard limit is hit:

{
  "status": "error",
  "error": {
    "code": "RATE_LIMIT_EXCEEDED",
    "message": "Hourly cost limit exceeded",
    "details": {
      "limit_type": "api_key",
      "period": "hourly",
      "current": 5.12,
      "limit": 5.0,
      "reset_at": "2025-01-15T15:00:00Z"
    }
  }
}

HTTP Status: 429 Too Many Requests

Cost Calculation

Costs are calculated per model based on token usage:

# Cost formula
input_cost = (input_tokens / 1_000_000) * cost_per_million_input
output_cost = (output_tokens / 1_000_000) * cost_per_million_output
total_cost = input_cost + output_cost

Model Pricing Reference (Google Gemini)

Model Input (per 1M tokens) Output (per 1M tokens)
gemini-2.0-flash $0.075 $0.30
gemini-2.0-flash-lite $0.0375 $0.15
gemini-3-pro-preview $1.25 $10.00
gemini-3-pro-image N/A $0.039 per image

Getting Cost from Response

response = requests.post(f"{API_URL}/generate", ...)
result = response.json()

if result["status"] == "success":
    cost = result["cost"]
    print(f"Input cost: ${cost['input']:.6f}")
    print(f"Output cost: ${cost['output']:.6f}")
    print(f"Total cost: ${cost['total']:.6f}")

    usage = result["usage"]
    print(f"Tokens: {usage['input_tokens']} in, {usage['output_tokens']} out")

Admin API for Analytics

Get Service Usage (Admin)

# Detailed service analytics
stats = session.get(
    f"{ADMIN_URL}/services/{service_id}/stats",
    params={
        "period": "day",  # hour, day, month
        "start": "2025-01-01",
        "end": "2025-01-15"
    }
).json()

for day in stats["daily"]:
    print(f"{day['date']}: {day['requests']} requests, ${day['cost']:.2f}")

Get API Key Usage (Admin)

# Per-key analytics
key_stats = session.get(
    f"{ADMIN_URL}/api-keys/{key_id}/stats",
    params={"period": "month"}
).json()

print(f"Total requests: {key_stats['total_requests']}")
print(f"Total cost: ${key_stats['total_cost']:.2f}")
print(f"Most used model: {key_stats['top_model']}")

Export Usage Logs

# Export to CSV
export = session.get(
    f"{ADMIN_URL}/usage/export",
    params={
        "format": "csv",
        "start": "2025-01-01",
        "end": "2025-01-31",
        "service_id": service_id
    }
)

with open("usage_report.csv", "wb") as f:
    f.write(export.content)

Setting Up Alerts

Webhook Notifications

Configure webhooks in the admin dashboard to receive alerts:

{
  "webhook_url": "https://your-server.com/alerts",
  "events": [
    "limit_warning",
    "limit_exceeded",
    "service_error"
  ],
  "threshold_percent": 80
}

Alert Payload Example

{
  "event": "limit_warning",
  "timestamp": "2025-01-15T14:32:00Z",
  "api_key": "image-workflow-prod",
  "service": "google-gemini",
  "details": {
    "period": "hourly",
    "current": 4.10,
    "limit": 5.00,
    "percent": 82
  }
}

Best Practices

  1. Set Conservative Limits - Start low and increase based on actual usage
  2. Monitor Daily - Check the dashboard or usage API regularly
  3. Use Warnings - Set warn thresholds at 60-80% of block thresholds
  4. Separate Keys - Use different API keys for different applications
  5. Export Regularly - Keep usage logs for accounting and analysis

Next Steps