2:I[7012,["4765","static/chunks/4765-f5afdf8061f456f3.js","9856","static/chunks/9856-3b185291364d9bef.js","6687","static/chunks/app/docs/%5B...slug%5D/page-e07536548216bee4.js"],"MarkdownRenderer"] 4:I[9856,["4765","static/chunks/4765-f5afdf8061f456f3.js","9856","static/chunks/9856-3b185291364d9bef.js","6687","static/chunks/app/docs/%5B...slug%5D/page-e07536548216bee4.js"],""] 5:I[4126,[],""] 7:I[9630,[],""] 8:I[4278,["9856","static/chunks/9856-3b185291364d9bef.js","8172","static/chunks/8172-b3a2d6fe4ae10d40.js","3185","static/chunks/app/layout-2814fa5d15b84fe4.js"],"HeadingProvider"] 9:I[1476,["9856","static/chunks/9856-3b185291364d9bef.js","8172","static/chunks/8172-b3a2d6fe4ae10d40.js","3185","static/chunks/app/layout-2814fa5d15b84fe4.js"],"Header"] a:I[3167,["9856","static/chunks/9856-3b185291364d9bef.js","8172","static/chunks/8172-b3a2d6fe4ae10d40.js","3185","static/chunks/app/layout-2814fa5d15b84fe4.js"],"Sidebar"] b:I[7409,["9856","static/chunks/9856-3b185291364d9bef.js","8172","static/chunks/8172-b3a2d6fe4ae10d40.js","3185","static/chunks/app/layout-2814fa5d15b84fe4.js"],"PageFrame"] 3:T2e6b, # Service Level Objectives (SLOs) - VoiceAssist V2 **Version:** 1.0 **Last Updated:** 2025-11-27 **Owner:** Platform Engineering Team ## Overview This document defines the Service Level Objectives (SLOs) for VoiceAssist V2. SLOs are reliability targets that balance user expectations with engineering effort. ### SLO Framework - **SLI (Service Level Indicator)**: Quantitative measure of service behavior (e.g., request latency, error rate) - **SLO (Service Level Objective)**: Target value or range for an SLI (e.g., 99.9% availability) - **SLA (Service Level Agreement)**: Customer-facing commitment with consequences (not defined yet for internal MVP) ### Error Budget An error budget is the maximum allowed unreliability before violating an SLO. For a 99.9% availability target over 30 days: - **Allowed downtime**: 43.2 minutes/month - **Allowed errors**: 0.1% of requests ## Core SLOs ### 1. API Availability SLO **Objective:** API endpoints should be available and responsive | Metric | Target | Measurement Window | Error Budget | | ------------ | ------ | ------------------ | -------------- | | Availability | 99.9% | 30 days | 43.2 min/month | | Success Rate | 99.5% | 30 days | 0.5% errors | **SLI Definition:** ```promql # Availability: Percentage of requests returning 2xx/3xx status sum(rate(http_requests_total{status_code=~"2..|3.."}[5m])) / sum(rate(http_requests_total[5m])) # Success Rate: Percentage of requests not returning 5xx errors 1 - ( sum(rate(http_requests_total{status_code=~"5.."}[5m])) / sum(rate(http_requests_total[5m])) ) ``` **Rationale:** - 99.9% availability is industry standard for non-critical services - Allows for planned maintenance and incident recovery - Balances reliability with development velocity **Exclusions:** - Planned maintenance windows (announced 48h in advance) - User errors (4xx responses except 429 rate limiting) - External service failures (OpenAI, Nextcloud) beyond our control --- ### 2. API Latency SLO **Objective:** API requests should complete quickly | Percentile | Target | Measurement Window | | ------------ | -------- | ------------------ | | P50 (median) | < 200ms | 5 minutes | | P95 | < 500ms | 5 minutes | | P99 | < 1000ms | 5 minutes | **SLI Definition:** ```promql # P95 latency histogram_quantile(0.95, sum(rate(http_request_duration_seconds_bucket[5m])) by (le, endpoint) ) # P99 latency histogram_quantile(0.99, sum(rate(http_request_duration_seconds_bucket[5m])) by (le, endpoint) ) ``` **Rationale:** - P50 target ensures fast response for majority of requests - P95/P99 targets catch tail latency issues - Targets aligned with user patience thresholds (< 1s for interactivity) **Critical Endpoints:** - `/api/auth/login`: P95 < 300ms (authentication is time-sensitive) - `/api/realtime/query`: P95 < 2000ms (RAG queries are more complex) - `/health`: P95 < 100ms (health checks must be fast) --- ### 3. RAG Query Quality SLO **Objective:** RAG queries should return relevant, accurate results | Metric | Target | Measurement Window | | ---------------------- | ----------- | ------------------ | | Query Success Rate | 99% | 30 days | | Cache Hit Rate | > 30% | 24 hours | | Average Search Results | > 2 results | 24 hours | **SLI Definition:** ```promql # Query Success Rate sum(rate(rag_query_duration_seconds_count{stage="total"}[5m])) / sum(rate(rag_query_attempts_total[5m])) # Cache Hit Rate sum(rate(cache_hits_total{cache_key_prefix="search_results"}[1h])) / sum(rate(cache_hits_total{cache_key_prefix="search_results"}[1h]) + rate(cache_misses_total{cache_key_prefix="search_results"}[1h])) # Average Search Results avg(rag_search_results_total) ``` **Rationale:** - 99% success rate allows for edge cases and system issues - 30% cache hit rate indicates effective caching strategy - 2+ results ensure users get actionable information --- ### 4. Database Performance SLO **Objective:** Database operations should be fast and reliable | Metric | Target | Measurement Window | | --------------------------- | ------- | ------------------ | | Query P95 Latency | < 100ms | 5 minutes | | Connection Success Rate | 99.9% | 30 days | | Connection Pool Utilization | < 80% | 5 minutes | **SLI Definition:** ```promql # Query Latency P95 histogram_quantile(0.95, sum(rate(db_query_duration_seconds_bucket[5m])) by (le) ) # Connection Success Rate 1 - ( sum(rate(db_connection_errors_total[5m])) / sum(rate(db_query_duration_seconds_count[5m])) ) # Pool Utilization sum(db_connections_total{state="in_use"}) / (sum(db_connections_total{state="in_use"}) + sum(db_connections_total{state="idle"})) ``` **Rationale:** - 100ms P95 ensures responsive API layer - High connection success rate prevents cascading failures - 80% pool utilization threshold leaves headroom for spikes --- ### 5. Cache Performance SLO **Objective:** Cache should provide performance improvements | Metric | Target | Measurement Window | | ----------------------- | --------------- | ------------------ | | Overall Hit Rate | > 40% | 24 hours | | L1 Cache Hit Rate | > 60% (of hits) | 24 hours | | Cache Operation Latency | P95 < 10ms | 5 minutes | **SLI Definition:** ```promql # Overall Cache Hit Rate sum(rate(cache_hits_total[1h])) / (sum(rate(cache_hits_total[1h])) + sum(rate(cache_misses_total[1h]))) # L1 Hit Rate (of all cache hits) sum(rate(cache_hits_total{cache_layer="l1"}[1h])) / sum(rate(cache_hits_total[1h])) # Cache Operation Latency P95 histogram_quantile(0.95, sum(rate(cache_latency_seconds_bucket[5m])) by (le, cache_layer) ) ``` **Rationale:** - 40% overall hit rate demonstrates effective caching - 60% L1 hit rate shows hot data staying in fast cache - Sub-10ms latency ensures cache doesn't become bottleneck --- ### 6. Document Processing SLO **Objective:** Document uploads should complete reliably | Metric | Target | Measurement Window | | ------------------- | ----------- | ------------------ | | Job Success Rate | 95% | 7 days | | Processing Time P95 | < 2 minutes | 7 days | | Queue Depth | < 100 jobs | 5 minutes | **SLI Definition:** ```promql # Job Success Rate sum(rate(document_processing_jobs_total{status="completed"}[1h])) / sum(rate(document_processing_jobs_total[1h])) # Processing Time P95 histogram_quantile(0.95, sum(rate(document_processing_duration_seconds_bucket[1h])) by (le) ) # Queue Depth sum(arq_queue_depth) ``` **Rationale:** - 95% success rate accounts for malformed documents and external API failures - 2-minute P95 ensures users don't wait excessively - Queue depth threshold prevents backlog accumulation --- ## SLO Monitoring Strategy ### Recording Rules Prometheus recording rules pre-compute complex SLI queries: ```yaml # /infrastructure/observability/prometheus/rules/slo_recording_rules.yml groups: - name: slo_recording_rules interval: 30s rules: # API Availability - record: slo:api_availability:ratio_rate5m expr: | sum(rate(http_requests_total{status_code=~"2..|3.."}[5m])) / sum(rate(http_requests_total[5m])) # API Latency P95 - record: slo:api_latency_p95:seconds expr: | histogram_quantile(0.95, sum(rate(http_request_duration_seconds_bucket[5m])) by (le, endpoint) ) ``` ### Alerting Rules Alerts fire when SLOs are at risk or violated: ```yaml # /infrastructure/observability/prometheus/rules/slo_alerts.yml groups: - name: slo_alerts rules: # Critical: SLO violated - alert: APIAvailabilitySLOViolation expr: slo:api_availability:ratio_rate5m < 0.999 for: 5m labels: severity: critical slo: availability annotations: summary: "API availability below SLO (99.9%)" description: "Current availability: {{ $value | humanizePercentage }}" # Warning: Error budget at risk (50% consumed) - alert: ErrorBudgetAtRisk expr: | (1 - slo:api_availability:ratio_rate30d) > 0.0005 for: 15m labels: severity: warning slo: availability annotations: summary: "Error budget consumption > 50%" ``` ### Dashboard Panels Grafana dashboard panels for SLO tracking: 1. **Availability Overview**: Current vs target availability 2. **Error Budget**: Remaining budget and burn rate 3. **Latency Distribution**: P50/P95/P99 over time 4. **SLO Compliance**: Per-service SLO status (green/yellow/red) 5. **Error Budget Timeline**: 30-day error budget consumption --- ## SLO Review Process ### Weekly Review - **Owner**: On-call engineer - **Review**: Current SLO status, recent violations - **Action**: Update runbooks if patterns emerge ### Monthly Review - **Owner**: Engineering Manager - **Review**: - SLO compliance trends - Error budget consumption patterns - SLO target appropriateness - **Action**: Adjust targets if consistently over/under performing ### Quarterly Review - **Owner**: Platform Team - **Review**: - SLO framework effectiveness - New SLOs needed - Retired SLOs - Target adjustments based on user feedback - **Action**: Update SLO document and targets --- ## Error Budget Policy ### When Error Budget is Healthy (> 50% remaining) - ✅ Deploy new features freely - ✅ Experiment with new technologies - ✅ Planned maintenance allowed - ✅ Refactoring and tech debt work ### When Error Budget is At Risk (25-50% remaining) - ⚠️ Increase review rigor for deployments - ⚠️ Prioritize reliability improvements - ⚠️ Defer non-critical features - ⚠️ Increase monitoring and alerting ### When Error Budget is Depleted (< 25% remaining) - 🛑 Feature freeze - only reliability improvements - 🛑 Increase on-call staffing - 🛑 Daily SLO status reviews - 🛑 Defer all non-critical work until budget recovers --- ## SLO Exceptions ### Planned Maintenance - Must be announced 48 hours in advance - Limited to 4 hours/month - Excluded from availability SLO - User-facing status page updated ### External Dependencies Failures of external services are tracked separately: - OpenAI API failures: Tracked but excluded from API availability SLO - Nextcloud unavailability: Tracked separately - DNS/network issues: Excluded if beyond our control ### Known Limitations - **Cold starts**: First request after deployment may be slow - **Cache warming**: Cache hit rate temporarily lower after cache clear - **Large documents**: Processing time varies with document size --- ## Appendix: Prometheus Queries ### Quick SLO Status Check ```promql # All SLOs at a glance {__name__=~"slo:.*"} ``` ### Error Budget Remaining ```promql # 30-day error budget remaining (as percentage) 1 - ( (1 - slo:api_availability:ratio_rate30d) / 0.001 ) ``` ### SLO Burn Rate ```promql # How fast are we consuming error budget? # > 1.0 means consuming faster than sustainable (1 - slo:api_availability:ratio_rate1h) / (1 - slo:api_availability:ratio_rate30d) ``` --- ## References - [Google SRE Book - SLOs](https://sre.google/sre-book/service-level-objectives/) - [Prometheus Best Practices - Recording Rules](https://prometheus.io/docs/practices/rules/) - [Grafana SLO Tracking](https://grafana.com/docs/grafana/latest/panels-visualizations/visualizations/stat/) ## Contact For SLO-related questions: - **Slack**: #platform-sre - **On-call**: PagerDuty escalation - **Documentation**: This file and `/docs/operations/RUNBOOKS.md` 6:["slug","operations/SLO_DEFINITIONS","c"] 0:["X7oMT3VrOffzp0qvbeOas",[[["",{"children":["docs",{"children":[["slug","operations/SLO_DEFINITIONS","c"],{"children":["__PAGE__?{\"slug\":[\"operations\",\"SLO_DEFINITIONS\"]}",{}]}]}]},"$undefined","$undefined",true],["",{"children":["docs",{"children":[["slug","operations/SLO_DEFINITIONS","c"],{"children":["__PAGE__",{},[["$L1",["$","div",null,{"children":[["$","div",null,{"className":"mb-6 flex items-center justify-between gap-4","children":[["$","div",null,{"children":[["$","p",null,{"className":"text-sm text-gray-500 dark:text-gray-400","children":"Docs / Raw"}],["$","h1",null,{"className":"text-3xl font-bold text-gray-900 dark:text-white","children":"Service Level Objectives (SLOs)"}],["$","p",null,{"className":"text-sm text-gray-600 dark:text-gray-400","children":["Sourced from"," ",["$","code",null,{"className":"font-mono text-xs","children":["docs/","operations/SLO_DEFINITIONS.md"]}]]}]]}],["$","a",null,{"href":"https://github.com/mohammednazmy/VoiceAssist/edit/main/docs/operations/SLO_DEFINITIONS.md","target":"_blank","rel":"noreferrer","className":"inline-flex items-center gap-2 rounded-md border border-gray-200 dark:border-gray-700 px-3 py-1.5 text-sm text-gray-700 dark:text-gray-200 hover:border-primary-500 dark:hover:border-primary-400 hover:text-primary-700 dark:hover:text-primary-300","children":"Edit on GitHub"}]]}],["$","div",null,{"className":"rounded-lg border border-gray-200 dark:border-gray-800 bg-white dark:bg-gray-900 p-6","children":["$","$L2",null,{"content":"$3"}]}],["$","div",null,{"className":"mt-6 flex flex-wrap gap-2 text-sm","children":[["$","$L4",null,{"href":"/reference/all-docs","className":"inline-flex items-center gap-1 rounded-md bg-gray-100 px-3 py-1 text-gray-700 hover:bg-gray-200 dark:bg-gray-800 dark:text-gray-200 dark:hover:bg-gray-700","children":"← All documentation"}],["$","$L4",null,{"href":"/","className":"inline-flex items-center gap-1 rounded-md bg-gray-100 px-3 py-1 text-gray-700 hover:bg-gray-200 dark:bg-gray-800 dark:text-gray-200 dark:hover:bg-gray-700","children":"Home"}]]}]]}],null],null],null]},[null,["$","$L5",null,{"parallelRouterKey":"children","segmentPath":["children","docs","children","$6","children"],"error":"$undefined","errorStyles":"$undefined","errorScripts":"$undefined","template":["$","$L7",null,{}],"templateStyles":"$undefined","templateScripts":"$undefined","notFound":"$undefined","notFoundStyles":"$undefined"}]],null]},[null,["$","$L5",null,{"parallelRouterKey":"children","segmentPath":["children","docs","children"],"error":"$undefined","errorStyles":"$undefined","errorScripts":"$undefined","template":["$","$L7",null,{}],"templateStyles":"$undefined","templateScripts":"$undefined","notFound":"$undefined","notFoundStyles":"$undefined"}]],null]},[[[["$","link","0",{"rel":"stylesheet","href":"/_next/static/css/7f586cdbbaa33ff7.css","precedence":"next","crossOrigin":"$undefined"}]],["$","html",null,{"lang":"en","className":"h-full","children":["$","body",null,{"className":"__className_f367f3 h-full bg-white dark:bg-gray-900","children":[["$","a",null,{"href":"#main-content","className":"skip-to-content","children":"Skip to main content"}],["$","$L8",null,{"children":[["$","$L9",null,{}],["$","$La",null,{}],["$","main",null,{"id":"main-content","className":"lg:pl-64","role":"main","aria-label":"Documentation content","children":["$","$Lb",null,{"children":["$","$L5",null,{"parallelRouterKey":"children","segmentPath":["children"],"error":"$undefined","errorStyles":"$undefined","errorScripts":"$undefined","template":["$","$L7",null,{}],"templateStyles":"$undefined","templateScripts":"$undefined","notFound":[["$","title",null,{"children":"404: This page could not be found."}],["$","div",null,{"style":{"fontFamily":"system-ui,\"Segoe UI\",Roboto,Helvetica,Arial,sans-serif,\"Apple Color Emoji\",\"Segoe UI Emoji\"","height":"100vh","textAlign":"center","display":"flex","flexDirection":"column","alignItems":"center","justifyContent":"center"},"children":["$","div",null,{"children":[["$","style",null,{"dangerouslySetInnerHTML":{"__html":"body{color:#000;background:#fff;margin:0}.next-error-h1{border-right:1px solid rgba(0,0,0,.3)}@media (prefers-color-scheme:dark){body{color:#fff;background:#000}.next-error-h1{border-right:1px solid rgba(255,255,255,.3)}}"}}],["$","h1",null,{"className":"next-error-h1","style":{"display":"inline-block","margin":"0 20px 0 0","padding":"0 23px 0 0","fontSize":24,"fontWeight":500,"verticalAlign":"top","lineHeight":"49px"},"children":"404"}],["$","div",null,{"style":{"display":"inline-block"},"children":["$","h2",null,{"style":{"fontSize":14,"fontWeight":400,"lineHeight":"49px","margin":0},"children":"This page could not be found."}]}]]}]}]],"notFoundStyles":[]}]}]}]]}]]}]}]],null],null],["$Lc",null]]]] c:[["$","meta","0",{"name":"viewport","content":"width=device-width, initial-scale=1"}],["$","meta","1",{"charSet":"utf-8"}],["$","title","2",{"children":"Service Level Objectives (SLOs) | Docs | VoiceAssist Docs"}],["$","meta","3",{"name":"description","content":"Reliability targets balancing user expectations with engineering effort."}],["$","meta","4",{"name":"keywords","content":"VoiceAssist,documentation,medical AI,voice assistant,healthcare,HIPAA,API"}],["$","meta","5",{"name":"robots","content":"index, follow"}],["$","meta","6",{"name":"googlebot","content":"index, follow"}],["$","link","7",{"rel":"canonical","href":"https://assistdocs.asimo.io"}],["$","meta","8",{"property":"og:title","content":"VoiceAssist Documentation"}],["$","meta","9",{"property":"og:description","content":"Comprehensive documentation for VoiceAssist - Enterprise Medical AI Assistant"}],["$","meta","10",{"property":"og:url","content":"https://assistdocs.asimo.io"}],["$","meta","11",{"property":"og:site_name","content":"VoiceAssist Docs"}],["$","meta","12",{"property":"og:type","content":"website"}],["$","meta","13",{"name":"twitter:card","content":"summary"}],["$","meta","14",{"name":"twitter:title","content":"VoiceAssist Documentation"}],["$","meta","15",{"name":"twitter:description","content":"Comprehensive documentation for VoiceAssist - Enterprise Medical AI Assistant"}],["$","meta","16",{"name":"next-size-adjust"}]] 1:null