Skip to content

๐Ÿš€ Nexent LLM Monitoring System โ€‹

Enterprise-grade monitoring solution specifically designed for monitoring LLM token generation speed and performance.

๐Ÿ“Š System Architecture โ€‹

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                Nexent LLM Monitoring System            โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚                                                         โ”‚
โ”‚  Nexent API โ”€โ”€โ–บ OpenTelemetry โ”€โ”€โ–บ Jaeger (Tracing)     โ”‚
โ”‚      โ”‚                  โ”‚                               โ”‚
โ”‚      โ”‚                  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ–บ Prometheus (Metrics)   โ”‚
โ”‚      โ”‚                             โ”‚                   โ”‚
โ”‚      โ””โ”€โ–บ OpenAI LLM                โ””โ”€โ”€โ–บ Grafana (Visualization) โ”‚
โ”‚          (Token Monitoring)                             โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

โšก Quick Start (5 minutes) โ€‹

bash
# 1. Start monitoring services
./docker/start-monitoring.sh

# 2. Install performance monitoring dependencies  
uv sync --extra performance

# 3. Enable monitoring
export ENABLE_TELEMETRY=true

# 4. Start backend service
python backend/main_service.py

๐Ÿ“Š Access Monitoring Interfaces โ€‹

InterfaceURLPurpose
Grafana Dashboardhttp://localhost:3005LLM Performance Monitoring
Jaeger Tracinghttp://localhost:16686Request Trace Analysis
Prometheus Metricshttp://localhost:9090Raw Monitoring Data

๐Ÿ” Grafana Login Information โ€‹

When first accessing Grafana (http://localhost:3005), you need to login:

Username: admin
Password: admin

After first login, you'll be prompted to change password:

  • Set a new password (recommended)
  • Click "Skip" to skip (development environment)

After login, you can see:

  • ๐Ÿ“Š LLM Performance Dashboard - Pre-configured performance dashboard
  • ๐Ÿ“ˆ Data Source Configuration - Auto-connected to Prometheus and Jaeger
  • ๐ŸŽฏ Real-time Monitoring Panel - Key metrics like token generation speed, latency

๐ŸŽฏ Core Features โ€‹

โšก LLM-Specific Monitoring โ€‹

  • Token Generation Speed: Real-time monitoring of tokens generated per second
  • TTFT (Time to First Token): First token return latency
  • Streaming Response Analysis: Generation timestamp for each token
  • Model Performance Comparison: Performance benchmarks across different models

๐Ÿ” Distributed Tracing โ€‹

  • Complete Request Chain: End-to-end tracing from HTTP to LLM
  • Performance Bottleneck Detection: Automatically identify slow queries and anomalies
  • Error Root Cause Analysis: Quickly locate problem sources

๐Ÿ› ๏ธ Developer-Friendly Design โ€‹

  • One-Line Integration: Quick monitoring with decorators
  • Zero-Dependency Degradation: Auto-skip when monitoring dependencies are missing
  • Zero-Touch Usage: No need to manually check monitoring status, handled automatically
  • Flexible Configuration: Environment variable controlled behavior

๐Ÿ› ๏ธ Adding Monitoring to Code โ€‹

python
# Backend service usage - directly use globally configured monitoring_manager
from utils.monitoring import monitoring_manager

# API endpoint monitoring
@monitoring_manager.monitor_endpoint("my_service.my_function")
async def my_api_function():
    return {"status": "ok"}

# LLM call monitoring
@monitoring_manager.monitor_llm_call("gpt-4", "chat_completion")
def call_llm(messages):
    # Automatically get token-level monitoring
    return llm_response

# Manual monitoring events
monitoring_manager.add_span_event("custom_event", {"key": "value"})
monitoring_manager.set_span_attributes(user_id="123", action="process")

๐Ÿ“ฆ Direct SDK Usage โ€‹

python
from nexent.monitor import get_monitoring_manager

# Get global monitoring manager - already configured in backend
monitor = get_monitoring_manager()

# Use decorators
@monitor.monitor_llm_call("claude-3", "completion")
def my_llm_function():
    return "response"

# Or use directly in business logic
with monitor.trace_llm_request("custom_operation", "my_model") as span:
    # Execute business logic
    result = process_data()
    monitor.add_span_event("processing_completed")
    return result

โœจ Global Configuration Automation โ€‹

Monitoring configuration is auto-initialized in backend/utils/monitoring.py:

python
# No manual configuration needed - auto-completed at system startup
# monitoring_manager already configured with environment variables
from utils.monitoring import monitoring_manager

# Direct usage without checking if enabled
@monitoring_manager.monitor_endpoint("my_function")
def my_function():
    pass

# FastAPI application initialization
monitoring_manager.setup_fastapi_app(app)

๐Ÿ”’ Auto Start/Stop Design โ€‹

  • Smart Monitoring: Auto start/stop based on ENABLE_TELEMETRY environment variable
  • Zero-Touch Usage: External code doesn't need to check monitoring status, use all features directly
  • Graceful Degradation: Silent no-effect when disabled, normal operation when enabled
  • Default Off: Auto-disabled when not configured
bash
# Enable monitoring
export ENABLE_TELEMETRY=true

# Disable monitoring  
export ENABLE_TELEMETRY=false

๐Ÿ“Š Core Monitoring Metrics โ€‹

MetricDescriptionImportance
llm_token_generation_rateToken generation speed (tokens/s)โญโญโญ
llm_time_to_first_token_secondsFirst token latencyโญโญโญ
llm_request_duration_secondsComplete request durationโญโญโญ
llm_total_tokensInput/output token countโญโญ
llm_error_countLLM call error countโญโญโญ

๐Ÿ”ง Environment Configuration โ€‹

bash
# Add to .env file
cat >> .env << EOF
ENABLE_TELEMETRY=true
SERVICE_NAME=nexent-backend
JAEGER_ENDPOINT=http://localhost:14268/api/traces
LLM_SLOW_REQUEST_THRESHOLD_SECONDS=5.0
LLM_SLOW_TOKEN_RATE_THRESHOLD=10.0
TELEMETRY_SAMPLE_RATE=1.0  # Development environment, production recommended 0.1
EOF

๐Ÿ› ๏ธ System Verification โ€‹

bash
# Check metrics endpoint
curl http://localhost:8000/metrics

# Verify dependency installation
python -c "from backend.utils.monitoring import MONITORING_AVAILABLE; print(f'Monitoring Available: {MONITORING_AVAILABLE}')"

๐Ÿ†˜ Troubleshooting โ€‹

No monitoring data? โ€‹

bash
# Check service status
docker-compose -f docker/docker-compose-monitoring.yml ps

# Check dependency installation
python -c "import opentelemetry; print('โœ… Monitoring dependencies installed')"

Port conflicts? โ€‹

bash
# Check port usage
lsof -i :3005 -i :9090 -i :16686

Dependency installation issues? โ€‹

bash
# Reinstall performance dependencies
uv sync --extra performance

# Check performance configuration in pyproject.toml
cat backend/pyproject.toml | grep -A 20 "performance"

Service name shows as unknown_service? โ€‹

bash
# Check environment variable configuration
echo "SERVICE_NAME: $SERVICE_NAME"

# Restart monitoring service to apply new configuration
./docker/start-monitoring.sh

๐Ÿงน Data Management โ€‹

Clean Jaeger Trace Data โ€‹

bash
# Method 1: Restart Jaeger container (simplest)
docker-compose -f docker/docker-compose-monitoring.yml restart nexent-jaeger

# Method 2: Completely rebuild Jaeger container and data
docker-compose -f docker/docker-compose-monitoring.yml stop nexent-jaeger
docker-compose -f docker/docker-compose-monitoring.yml rm -f nexent-jaeger
docker-compose -f docker/docker-compose-monitoring.yml up -d nexent-jaeger

# Method 3: Clean all monitoring data (rebuild all containers)
docker-compose -f docker/docker-compose-monitoring.yml down
docker-compose -f docker/docker-compose-monitoring.yml up -d

Clean Prometheus Metrics Data โ€‹

bash
# Restart Prometheus container
docker-compose -f docker/docker-compose-monitoring.yml restart nexent-prometheus

# Completely clean Prometheus data
docker-compose -f docker/docker-compose-monitoring.yml stop nexent-prometheus
docker volume rm docker_prometheus_data 2>/dev/null || true
docker-compose -f docker/docker-compose-monitoring.yml up -d nexent-prometheus

Clean Grafana Configuration โ€‹

bash
# Reset Grafana configuration and dashboards
docker-compose -f docker/docker-compose-monitoring.yml stop nexent-grafana
docker volume rm docker_grafana_data 2>/dev/null || true
docker-compose -f docker/docker-compose-monitoring.yml up -d nexent-grafana

๐Ÿ“ˆ Typical Problem Analysis โ€‹

Slow token generation (< 5 tokens/s) โ€‹

  1. Analysis: Grafana โ†’ Token Generation Rate panel
  2. Solution: Check model service load, optimize input prompt length

Slow request response (> 10s) โ€‹

  1. Analysis: Jaeger โ†’ View complete trace chain
  2. Solution: Locate bottleneck (database/LLM/network)

Error rate spike (> 10%) โ€‹

  1. Analysis: Prometheus โ†’ llm_error_count metric
  2. Solution: Check model service availability, verify API keys

๐ŸŽ‰ Getting Started โ€‹

After setup completion, you can:

  1. ๐Ÿ“Š View LLM Performance Dashboard in Grafana
  2. ๐Ÿ” Trace complete request chains in Jaeger
  3. ๐Ÿ“ˆ Analyze token generation speed and performance bottlenecks
  4. ๐Ÿšจ Set performance alerts and thresholds

Enjoy efficient LLM performance monitoring! ๐Ÿš€