Monitoring Usage and Costs
This guide covers tracking API usage, understanding costs, and managing rate limits.
Real-Time Usage Tracking
Every API request is logged with:
- Timestamp and request ID
- Model and service used
- Input/output token counts
- Calculated costs
- Success/failure status
Checking Usage via API
Current Period Usage
import requests
API_URL = "http://localhost:8000/api/v1"
API_KEY = "sk_your_api_key"
response = requests.get(
f"{API_URL}/usage",
headers={"X-API-Key": API_KEY}
)
usage = response.json()
print(f"Current Hour: ${usage['hour']['cost']:.4f}")
print(f"Current Day: ${usage['day']['cost']:.4f}")
print(f"Current Month: ${usage['month']['cost']:.4f}")
print(f"Total (All Time): ${usage['total']['cost']:.4f}")
Response Structure
{
"api_key": "image-workflow-prod",
"service": "google-gemini",
"hour": {
"requests": 42,
"tokens": {"input": 15000, "output": 8000},
"cost": 0.0135,
"limit_warn": 2.0,
"limit_block": 5.0,
"percent_used": 0.27
},
"day": {
"requests": 312,
"tokens": {"input": 125000, "output": 62000},
"cost": 0.1245,
"limit_warn": 20.0,
"limit_block": 50.0,
"percent_used": 0.25
},
"month": {
"requests": 4521,
"tokens": {"input": 1800000, "output": 920000},
"cost": 1.82,
"limit_warn": 500.0,
"limit_block": 1000.0,
"percent_used": 0.18
},
"total": {
"requests": 12450,
"tokens": {"input": 5200000, "output": 2600000},
"cost": 5.24
}
}
Admin Dashboard Monitoring
The admin dashboard provides visual analytics:
Services Overview
- Total requests per service
- Cost breakdown by model
- Error rates and trends
- Active API keys count
Usage Analytics
- Hourly/daily/monthly charts
- Cost forecasting
- Top consumers
- Model popularity
Audit Logs
- All API requests with details
- Authentication events
- Configuration changes
- Rate limit events
Rate Limit Monitoring
Understanding Limit Levels
| Level | Trigger | Behavior |
|---|---|---|
| Normal | < 80% of warn | Request proceeds |
| Warning | ≥ 80% of warn | Request proceeds + warning header |
| Soft Limit | ≥ warn, < block | Request proceeds + warning header |
| Hard Limit | ≥ block | Request rejected (429) |
Checking Limit Status
# Make a request and check headers
response = requests.post(
f"{API_URL}/generate",
headers={"X-API-Key": API_KEY},
json={...}
)
# Check for warnings in response
result = response.json()
if result.get("warnings"):
for warning in result["warnings"]:
print(f"Warning: {warning}")
# e.g., "Approaching hourly limit: 82% used ($4.10 of $5.00)"
# Check response headers
if "X-RateLimit-Warning" in response.headers:
print(f"Rate limit warning: {response.headers['X-RateLimit-Warning']}")
Rate Limit Response
When a hard limit is hit:
{
"status": "error",
"error": {
"code": "RATE_LIMIT_EXCEEDED",
"message": "Hourly cost limit exceeded",
"details": {
"limit_type": "api_key",
"period": "hourly",
"current": 5.12,
"limit": 5.0,
"reset_at": "2025-01-15T15:00:00Z"
}
}
}
HTTP Status: 429 Too Many Requests
Cost Calculation
Costs are calculated per model based on token usage:
# Cost formula
input_cost = (input_tokens / 1_000_000) * cost_per_million_input
output_cost = (output_tokens / 1_000_000) * cost_per_million_output
total_cost = input_cost + output_cost
Model Pricing Reference (Google Gemini)
| Model | Input (per 1M tokens) | Output (per 1M tokens) |
|---|---|---|
| gemini-2.0-flash | $0.075 | $0.30 |
| gemini-2.0-flash-lite | $0.0375 | $0.15 |
| gemini-3-pro-preview | $1.25 | $10.00 |
| gemini-3-pro-image | N/A | $0.039 per image |
Getting Cost from Response
response = requests.post(f"{API_URL}/generate", ...)
result = response.json()
if result["status"] == "success":
cost = result["cost"]
print(f"Input cost: ${cost['input']:.6f}")
print(f"Output cost: ${cost['output']:.6f}")
print(f"Total cost: ${cost['total']:.6f}")
usage = result["usage"]
print(f"Tokens: {usage['input_tokens']} in, {usage['output_tokens']} out")
Admin API for Analytics
Get Service Usage (Admin)
# Detailed service analytics
stats = session.get(
f"{ADMIN_URL}/services/{service_id}/stats",
params={
"period": "day", # hour, day, month
"start": "2025-01-01",
"end": "2025-01-15"
}
).json()
for day in stats["daily"]:
print(f"{day['date']}: {day['requests']} requests, ${day['cost']:.2f}")
Get API Key Usage (Admin)
# Per-key analytics
key_stats = session.get(
f"{ADMIN_URL}/api-keys/{key_id}/stats",
params={"period": "month"}
).json()
print(f"Total requests: {key_stats['total_requests']}")
print(f"Total cost: ${key_stats['total_cost']:.2f}")
print(f"Most used model: {key_stats['top_model']}")
Export Usage Logs
# Export to CSV
export = session.get(
f"{ADMIN_URL}/usage/export",
params={
"format": "csv",
"start": "2025-01-01",
"end": "2025-01-31",
"service_id": service_id
}
)
with open("usage_report.csv", "wb") as f:
f.write(export.content)
Setting Up Alerts
Webhook Notifications
Configure webhooks in the admin dashboard to receive alerts:
{
"webhook_url": "https://your-server.com/alerts",
"events": [
"limit_warning",
"limit_exceeded",
"service_error"
],
"threshold_percent": 80
}
Alert Payload Example
{
"event": "limit_warning",
"timestamp": "2025-01-15T14:32:00Z",
"api_key": "image-workflow-prod",
"service": "google-gemini",
"details": {
"period": "hourly",
"current": 4.10,
"limit": 5.00,
"percent": 82
}
}
Best Practices
- Set Conservative Limits - Start low and increase based on actual usage
- Monitor Daily - Check the dashboard or usage API regularly
- Use Warnings - Set warn thresholds at 60-80% of block thresholds
- Separate Keys - Use different API keys for different applications
- Export Regularly - Keep usage logs for accounting and analysis