Grafana, Prometheus metrics [BETA]
LiteLLM Exposes a /metrics endpoint for Prometheus to Poll
Quick Start
If you're using the LiteLLM CLI with litellm --config proxy_config.yaml then you need to pip install prometheus_client==0.20.0. This is already pre-installed on the litellm Docker image
Add this to your proxy config.yaml
model_list:
 - model_name: gpt-3.5-turbo
    litellm_params:
      model: gpt-3.5-turbo
litellm_settings:
  success_callback: ["prometheus"]
  failure_callback: ["prometheus"]
Start the proxy
litellm --config config.yaml --debug
Test Request
curl --location 'http://0.0.0.0:4000/chat/completions' \
    --header 'Content-Type: application/json' \
    --data '{
    "model": "gpt-3.5-turbo",
    "messages": [
        {
        "role": "user",
        "content": "what llm are you"
        }
    ]
}'
View Metrics on /metrics, Visit http://localhost:4000/metrics 
http://localhost:4000/metrics
# <proxy_base_url>/metrics
Metrics Tracked
| Metric Name | Description | 
|---|---|
litellm_requests_metric | Number of requests made, per "user", "key", "model", "team", "end-user" | 
litellm_spend_metric | Total Spend, per "user", "key", "model", "team", "end-user" | 
litellm_total_tokens | input + output tokens per "user", "key", "model", "team", "end-user" | 
litellm_llm_api_failed_requests_metric | Number of failed LLM API requests per "user", "key", "model", "team", "end-user" | 
Monitor System Health
To monitor the health of litellm adjacent services (redis / postgres), do:
model_list:
 - model_name: gpt-3.5-turbo
    litellm_params:
      model: gpt-3.5-turbo
litellm_settings:
  service_callback: ["prometheus_system"]
| Metric Name | Description | 
|---|---|
litellm_redis_latency | histogram latency for redis calls | 
litellm_redis_fails | Number of failed redis calls | 
litellm_self_latency | Histogram latency for successful litellm api call |