Operations Overview
Last Updated: 2025-11-27
This document provides a central hub for all operations-related documentation for VoiceAssist.
Quick Links
Runbooks
All runbooks follow a standardized format with severity levels, step-by-step procedures, and verification steps.
Compliance
For HIPAA compliance, see Security & Compliance.
Incident Severity Levels
| Severity | Description | Response Time |
|---|
| P1 - Critical | Complete service outage, data loss risk | 15 minutes |
| P2 - High | Major feature broken, significant degradation | 1 hour |
| P3 - Medium | Minor feature broken, degraded performance | 4 hours |
| P4 - Low | Cosmetic issues, minimal impact | 24 hours |
Key SLOs
| Metric | Target | Measurement Window |
|---|
| API Availability | 99.9% | 30 days |
| Success Rate | 99.5% | 30 days |
| P95 Latency | < 200ms | 30 days |
| Error Rate | < 0.5% | 30 days |
On-Call Essentials
Quick Diagnostic Commands
# Check service health
curl http://localhost:8000/health
curl http://localhost:8000/ready
# Check all containers
docker compose ps
# View recent logs
docker compose logs --tail=100 voiceassist-server
# Check database
docker compose exec postgres psql -U voiceassist -c "SELECT 1"
# Check Redis
docker compose exec redis redis-cli ping
Escalation Path
- L1 Support: Check health endpoints, restart services
- L2 DevOps: Investigate logs, check metrics, apply standard fixes
- L3 Engineering: Deep debugging, code-level investigation
- Management: Major incidents requiring business decisions
Version History
| Date | Version | Changes |
|---|
| 2025-11-27 | 1.0.0 | Initial operations overview |