2:I[7012,["4765","static/chunks/4765-f5afdf8061f456f3.js","9856","static/chunks/9856-3b185291364d9bef.js","6687","static/chunks/app/docs/%5B...slug%5D/page-e07536548216bee4.js"],"MarkdownRenderer"] 4:I[9856,["4765","static/chunks/4765-f5afdf8061f456f3.js","9856","static/chunks/9856-3b185291364d9bef.js","6687","static/chunks/app/docs/%5B...slug%5D/page-e07536548216bee4.js"],""] 5:I[4126,[],""] 7:I[9630,[],""] 8:I[4278,["9856","static/chunks/9856-3b185291364d9bef.js","8172","static/chunks/8172-b3a2d6fe4ae10d40.js","3185","static/chunks/app/layout-2814fa5d15b84fe4.js"],"HeadingProvider"] 9:I[1476,["9856","static/chunks/9856-3b185291364d9bef.js","8172","static/chunks/8172-b3a2d6fe4ae10d40.js","3185","static/chunks/app/layout-2814fa5d15b84fe4.js"],"Header"] a:I[3167,["9856","static/chunks/9856-3b185291364d9bef.js","8172","static/chunks/8172-b3a2d6fe4ae10d40.js","3185","static/chunks/app/layout-2814fa5d15b84fe4.js"],"Sidebar"] b:I[7409,["9856","static/chunks/9856-3b185291364d9bef.js","8172","static/chunks/8172-b3a2d6fe4ae10d40.js","3185","static/chunks/app/layout-2814fa5d15b84fe4.js"],"PageFrame"] 3:T78df, # Scaling Runbook **Last Updated**: 2025-11-27 **Purpose**: Comprehensive guide for scaling VoiceAssist V2 infrastructure --- ## Scaling Overview ### Current Architecture ``` Load Balancer (if configured) ↓ VoiceAssist Server (Scalable) ↓ ├── PostgreSQL (Primary + Read Replicas) ├── Redis (Cluster or Sentinel) └── Qdrant (Distributed) ``` ### Scaling Strategy | Component | Type | Method | Max Recommended | | ---------------------- | --------- | ------------------------ | ---------------------------- | | **VoiceAssist Server** | Stateless | Horizontal | 10+ instances | | **PostgreSQL** | Stateful | Vertical + Read Replicas | 1 primary + 5 replicas | | **Redis** | Stateful | Vertical + Cluster | 6 nodes (3 master + 3 slave) | | **Qdrant** | Stateful | Horizontal + Sharding | 6+ nodes | --- ## When to Scale ### Scaling Triggers #### Immediate Scaling (Reactive) Scale **immediately** if: - CPU usage > 80% for 10+ minutes - Memory usage > 85% - Response time > 2 seconds (p95) - Error rate > 5% - Connection pool exhausted - Queue depth > 1000 #### Planned Scaling (Proactive) Schedule scaling if: - Expected traffic increase (events, marketing campaigns) - New feature launch with heavy load - Approaching 70% capacity on any metric - Seasonal traffic patterns ### Scaling Decision Matrix ```bash # Quick capacity check cat > /usr/local/bin/va-capacity-check <<'EOF' #!/bin/bash echo "VoiceAssist Capacity Check - $(date)" echo "========================================" # Check application load CPU=$(docker stats --no-stream --format "{{.CPUPerc}}" voiceassist-voiceassist-server-1 | sed 's/%//') MEM=$(docker stats --no-stream --format "{{.MemPerc}}" voiceassist-voiceassist-server-1 | sed 's/%//') echo "Application:" echo " CPU: ${CPU}%" echo " Memory: ${MEM}%" # Database connections DB_CONN=$(docker compose exec -T postgres psql -U voiceassist -d voiceassist -t -c \ "SELECT count(*) FROM pg_stat_activity WHERE state = 'active';" | tr -d ' ') DB_MAX=$(docker compose exec -T postgres psql -U voiceassist -d voiceassist -t -c \ "SHOW max_connections;" | tr -d ' ') DB_USAGE=$((DB_CONN * 100 / DB_MAX)) echo "Database:" echo " Active Connections: ${DB_CONN}/${DB_MAX} (${DB_USAGE}%)" # Redis memory REDIS_MEM=$(docker compose exec -T redis redis-cli INFO memory | grep used_memory_human | cut -d: -f2 | tr -d '\r') echo "Redis:" echo " Memory Usage: ${REDIS_MEM}" # Recommendation echo "" echo "Scaling Recommendations:" if (( $(echo "$CPU > 80" | bc -l) )) || (( $(echo "$MEM > 85" | bc -l) )); then echo "🔴 IMMEDIATE: Scale application horizontally" elif (( $(echo "$CPU > 70" | bc -l) )) || (( $(echo "$MEM > 70" | bc -l) )); then echo "🟡 SOON: Plan to scale within 24 hours" elif [ $DB_USAGE -gt 80 ]; then echo "🔴 IMMEDIATE: Scale database connections or add read replica" else echo "🟢 OK: Current capacity is adequate" fi EOF chmod +x /usr/local/bin/va-capacity-check ``` --- ## Horizontal Scaling - Application Server ### Quick Scale Up ```bash # Scale to 3 instances docker compose up -d --scale voiceassist-server=3 # Verify all instances running docker compose ps voiceassist-server # Expected output: 3 containers running # voiceassist-voiceassist-server-1 # voiceassist-voiceassist-server-2 # voiceassist-voiceassist-server-3 # Check health of all instances for i in {1..3}; do echo "Instance $i:" docker inspect voiceassist-voiceassist-server-$i | jq '.[0].State.Health.Status' done ``` ### Scale with Load Balancer ```yaml # Add to docker-compose.yml services: nginx: image: nginx:alpine ports: - "80:80" volumes: - ./nginx.conf:/etc/nginx/nginx.conf:ro depends_on: - voiceassist-server voiceassist-server: # ... existing config ... deploy: replicas: 3 resources: limits: cpus: "2" memory: 2G reservations: cpus: "1" memory: 1G ``` ```nginx # Create nginx.conf for load balancing upstream voiceassist_backend { least_conn; # Use least connections algorithm server voiceassist-server-1:8000 max_fails=3 fail_timeout=30s; server voiceassist-server-2:8000 max_fails=3 fail_timeout=30s; server voiceassist-server-3:8000 max_fails=3 fail_timeout=30s; keepalive 32; } server { listen 80; location / { proxy_pass http://voiceassist_backend; proxy_http_version 1.1; proxy_set_header Connection ""; proxy_set_header Host $host; proxy_set_header X-Real-IP $remote_addr; proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; # Timeouts proxy_connect_timeout 5s; proxy_send_timeout 60s; proxy_read_timeout 60s; # Health check proxy_next_upstream error timeout http_500 http_502 http_503; } location /health { access_log off; proxy_pass http://voiceassist_backend; } } ``` ```bash # Deploy with load balancer docker compose up -d --scale voiceassist-server=3 # Verify load balancing for i in {1..10}; do curl -s http://localhost/health | jq -r '.hostname' done # Should show different hostnames, indicating round-robin ``` ### Auto-Scaling with Metrics ```bash #!/bin/bash # Save as: /usr/local/bin/va-autoscale MIN_INSTANCES=2 MAX_INSTANCES=10 SCALE_UP_THRESHOLD=70 SCALE_DOWN_THRESHOLD=30 CHECK_INTERVAL=60 while true; do # Get current instance count CURRENT=$(docker compose ps -q voiceassist-server | wc -l) # Get average CPU across all instances AVG_CPU=$(docker stats --no-stream --format "{{.CPUPerc}}" \ $(docker compose ps -q voiceassist-server) | \ sed 's/%//g' | \ awk '{s+=$1; n++} END {print s/n}') echo "[$(date)] Instances: $CURRENT, Avg CPU: ${AVG_CPU}%" # Scale up if (( $(echo "$AVG_CPU > $SCALE_UP_THRESHOLD" | bc -l) )) && [ $CURRENT -lt $MAX_INSTANCES ]; then NEW_COUNT=$((CURRENT + 1)) echo "Scaling UP to $NEW_COUNT instances (CPU: ${AVG_CPU}%)" docker compose up -d --scale voiceassist-server=$NEW_COUNT # Scale down elif (( $(echo "$AVG_CPU < $SCALE_DOWN_THRESHOLD" | bc -l) )) && [ $CURRENT -gt $MIN_INSTANCES ]; then NEW_COUNT=$((CURRENT - 1)) echo "Scaling DOWN to $NEW_COUNT instances (CPU: ${AVG_CPU}%)" docker compose up -d --scale voiceassist-server=$NEW_COUNT else echo "No scaling needed" fi sleep $CHECK_INTERVAL done ``` ### Graceful Instance Shutdown ```bash # Scale down with zero downtime CURRENT=$(docker compose ps -q voiceassist-server | wc -l) TARGET=$((CURRENT - 1)) echo "Scaling from $CURRENT to $TARGET instances" # Get last instance LAST_INSTANCE="voiceassist-voiceassist-server-${CURRENT}" # Stop accepting new connections (if using load balancer) docker compose exec nginx nginx -s reload # Wait for existing connections to drain (30 seconds) echo "Draining connections..." sleep 30 # Check remaining connections ACTIVE_CONN=$(docker exec $LAST_INSTANCE netstat -an | grep :8000 | grep ESTABLISHED | wc -l) echo "Active connections on instance: $ACTIVE_CONN" # Scale down docker compose up -d --scale voiceassist-server=$TARGET echo "Scaled down to $TARGET instances" ``` --- ## Vertical Scaling - Application Server ### Increase CPU and Memory ```yaml # Update docker-compose.yml services: voiceassist-server: deploy: resources: limits: cpus: "4" # Increased from 2 memory: 4G # Increased from 2G reservations: cpus: "2" # Increased from 1 memory: 2G # Increased from 1G ``` ```bash # Apply changes docker compose up -d voiceassist-server # Verify new limits docker inspect voiceassist-voiceassist-server-1 | \ jq '.[0].HostConfig.Memory, .[0].HostConfig.NanoCpus' # Monitor performance improvement docker stats voiceassist-voiceassist-server-1 ``` ### Optimize Application Workers ```bash # Increase Gunicorn workers in Dockerfile or docker-compose.yml # Rule: workers = (2 x CPU cores) + 1 # For 4 CPU cores: WORKERS=9 # (2 x 4) + 1 # Update environment variable docker compose exec voiceassist-server sh -c \ "export GUNICORN_WORKERS=$WORKERS && supervisorctl restart gunicorn" # Verify worker count docker compose exec voiceassist-server ps aux | grep gunicorn ``` --- ## PostgreSQL Scaling ### Vertical Scaling - Increase Resources ```yaml # Update docker-compose.yml services: postgres: deploy: resources: limits: cpus: "4" memory: 8G reservations: cpus: "2" memory: 4G command: - "postgres" - "-c" - "max_connections=200" # Increased from 100 - "-c" - "shared_buffers=2GB" # Increased from 256MB - "-c" - "effective_cache_size=6GB" # Increased - "-c" - "maintenance_work_mem=512MB" # Increased - "-c" - "checkpoint_completion_target=0.9" - "-c" - "wal_buffers=16MB" - "-c" - "default_statistics_target=100" - "-c" - "random_page_cost=1.1" - "-c" - "effective_io_concurrency=200" - "-c" - "work_mem=10MB" # Increased - "-c" - "min_wal_size=1GB" - "-c" - "max_wal_size=4GB" # Increased ``` ```bash # Apply changes docker compose up -d postgres # Verify new settings docker compose exec postgres psql -U voiceassist -d voiceassist -c \ "SHOW max_connections; SHOW shared_buffers; SHOW effective_cache_size;" ``` ### Read Replica Setup ```yaml # Add to docker-compose.yml services: postgres-replica: image: postgres:15 environment: POSTGRES_PASSWORD: ${POSTGRES_PASSWORD} volumes: - postgres_replica_data:/var/lib/postgresql/data command: - "postgres" - "-c" - "hot_standby=on" - "-c" - "max_connections=200" depends_on: - postgres volumes: postgres_replica_data: ``` ```bash # Setup replication on primary docker compose exec postgres psql -U voiceassist -d postgres < recovery.conf < redis-sentinel.conf < r.status === 200, }); let token = loginRes.json("access_token"); // Make authenticated requests let headers = { Authorization: `Bearer ${token}`, }; let profileRes = http.get("http://localhost:8000/api/users/me", { headers }); check(profileRes, { "profile retrieved": (r) => r.status === 200, }); sleep(1); } ``` ```bash # Run k6 load test k6 run loadtest.js # With custom output k6 run --out json=results.json loadtest.js # View results cat results.json | jq '.metrics' ``` ### Database Load Testing ```bash # Test PostgreSQL under load # Create pgbench database docker compose exec postgres createdb -U voiceassist pgbench_test # Initialize pgbench docker compose exec postgres pgbench -i -U voiceassist pgbench_test # Run benchmark (100 clients, 1000 transactions each) docker compose exec postgres pgbench \ -c 100 \ -t 1000 \ -U voiceassist \ pgbench_test # Results show: # - TPS (transactions per second) # - Average latency # - Connection time ``` ### Redis Load Testing ```bash # Use redis-benchmark docker compose exec redis redis-benchmark \ -h localhost \ -p 6379 \ -c 100 \ -n 100000 \ -d 100 \ --csv # Test specific commands docker compose exec redis redis-benchmark \ -t set,get,incr,lpush,lpop \ -n 100000 \ -q ``` --- ## Capacity Planning ### Current Capacity Assessment ```bash #!/bin/bash # Save as: /usr/local/bin/va-capacity-report echo "VoiceAssist Capacity Report - $(date)" echo "========================================" echo "" # Application instances APP_INSTANCES=$(docker compose ps -q voiceassist-server | wc -l) echo "Application Instances: $APP_INSTANCES" # Resource usage per instance docker stats --no-stream --format "table {{.Name}}\t{{.CPUPerc}}\t{{.MemUsage}}" \ $(docker compose ps -q voiceassist-server) echo "" # Database metrics echo "Database Metrics:" docker compose exec -T postgres psql -U voiceassist -d voiceassist < 150" echo " - Redis: Current memory usage allows 2x data growth" ``` ### Growth Planning ```bash # Estimate required resources for growth # Current metrics (example) CURRENT_USERS=1000 CURRENT_RPS=50 CURRENT_DB_SIZE_GB=10 # Growth projections GROWTH_RATE=1.5 # 50% growth MONTHS=6 # Calculate future requirements cat > /tmp/capacity_projection.py < 150: print(f" Database: Primary + 2 read replicas + PgBouncer") else: print(f" Database: Primary + PgBouncer") if redis_gb > 4: print(f" Redis: 3-node cluster") else: print(f" Redis: Single instance ({redis_gb}GB)") EOF python3 /tmp/capacity_projection.py ``` --- ## Performance Optimization ### Application Optimization ```bash # Enable response caching cat >> .env < nginx-compression.conf <> .env <