2:I[7012,["4765","static/chunks/4765-f5afdf8061f456f3.js","9856","static/chunks/9856-3b185291364d9bef.js","6687","static/chunks/app/docs/%5B...slug%5D/page-e07536548216bee4.js"],"MarkdownRenderer"]
4:I[9856,["4765","static/chunks/4765-f5afdf8061f456f3.js","9856","static/chunks/9856-3b185291364d9bef.js","6687","static/chunks/app/docs/%5B...slug%5D/page-e07536548216bee4.js"],""]
5:I[4126,[],""]
7:I[9630,[],""]
8:I[4278,["9856","static/chunks/9856-3b185291364d9bef.js","8172","static/chunks/8172-b3a2d6fe4ae10d40.js","3185","static/chunks/app/layout-2814fa5d15b84fe4.js"],"HeadingProvider"]
9:I[1476,["9856","static/chunks/9856-3b185291364d9bef.js","8172","static/chunks/8172-b3a2d6fe4ae10d40.js","3185","static/chunks/app/layout-2814fa5d15b84fe4.js"],"Header"]
a:I[3167,["9856","static/chunks/9856-3b185291364d9bef.js","8172","static/chunks/8172-b3a2d6fe4ae10d40.js","3185","static/chunks/app/layout-2814fa5d15b84fe4.js"],"Sidebar"]
b:I[7409,["9856","static/chunks/9856-3b185291364d9bef.js","8172","static/chunks/8172-b3a2d6fe4ae10d40.js","3185","static/chunks/app/layout-2814fa5d15b84fe4.js"],"PageFrame"]
3:T2e6b,
# Service Level Objectives (SLOs) - VoiceAssist V2

**Version:** 1.0
**Last Updated:** 2025-11-27
**Owner:** Platform Engineering Team

## Overview

This document defines the Service Level Objectives (SLOs) for VoiceAssist V2. SLOs are reliability targets that balance user expectations with engineering effort.

### SLO Framework

- **SLI (Service Level Indicator)**: Quantitative measure of service behavior (e.g., request latency, error rate)
- **SLO (Service Level Objective)**: Target value or range for an SLI (e.g., 99.9% availability)
- **SLA (Service Level Agreement)**: Customer-facing commitment with consequences (not defined yet for internal MVP)

### Error Budget

An error budget is the maximum allowed unreliability before violating an SLO. For a 99.9% availability target over 30 days:

- **Allowed downtime**: 43.2 minutes/month
- **Allowed errors**: 0.1% of requests

## Core SLOs

### 1. API Availability SLO

**Objective:** API endpoints should be available and responsive

| Metric       | Target | Measurement Window | Error Budget   |
| ------------ | ------ | ------------------ | -------------- |
| Availability | 99.9%  | 30 days            | 43.2 min/month |
| Success Rate | 99.5%  | 30 days            | 0.5% errors    |

**SLI Definition:**

```promql
# Availability: Percentage of requests returning 2xx/3xx status
sum(rate(http_requests_total{status_code=~"2..|3.."}[5m]))
/
sum(rate(http_requests_total[5m]))

# Success Rate: Percentage of requests not returning 5xx errors
1 - (
  sum(rate(http_requests_total{status_code=~"5.."}[5m]))
  /
  sum(rate(http_requests_total[5m]))
)
```

**Rationale:**

- 99.9% availability is industry standard for non-critical services
- Allows for planned maintenance and incident recovery
- Balances reliability with development velocity

**Exclusions:**

- Planned maintenance windows (announced 48h in advance)
- User errors (4xx responses except 429 rate limiting)
- External service failures (OpenAI, Nextcloud) beyond our control

---

### 2. API Latency SLO

**Objective:** API requests should complete quickly

| Percentile   | Target   | Measurement Window |
| ------------ | -------- | ------------------ |
| P50 (median) | < 200ms  | 5 minutes          |
| P95          | < 500ms  | 5 minutes          |
| P99          | < 1000ms | 5 minutes          |

**SLI Definition:**

```promql
# P95 latency
histogram_quantile(0.95,
  sum(rate(http_request_duration_seconds_bucket[5m])) by (le, endpoint)
)

# P99 latency
histogram_quantile(0.99,
  sum(rate(http_request_duration_seconds_bucket[5m])) by (le, endpoint)
)
```

**Rationale:**

- P50 target ensures fast response for majority of requests
- P95/P99 targets catch tail latency issues
- Targets aligned with user patience thresholds (< 1s for interactivity)

**Critical Endpoints:**

- `/api/auth/login`: P95 < 300ms (authentication is time-sensitive)
- `/api/realtime/query`: P95 < 2000ms (RAG queries are more complex)
- `/health`: P95 < 100ms (health checks must be fast)

---

### 3. RAG Query Quality SLO

**Objective:** RAG queries should return relevant, accurate results

| Metric                 | Target      | Measurement Window |
| ---------------------- | ----------- | ------------------ |
| Query Success Rate     | 99%         | 30 days            |
| Cache Hit Rate         | > 30%       | 24 hours           |
| Average Search Results | > 2 results | 24 hours           |

**SLI Definition:**

```promql
# Query Success Rate
sum(rate(rag_query_duration_seconds_count{stage="total"}[5m]))
/
sum(rate(rag_query_attempts_total[5m]))

# Cache Hit Rate
sum(rate(cache_hits_total{cache_key_prefix="search_results"}[1h]))
/
sum(rate(cache_hits_total{cache_key_prefix="search_results"}[1h]) + rate(cache_misses_total{cache_key_prefix="search_results"}[1h]))

# Average Search Results
avg(rag_search_results_total)
```

**Rationale:**

- 99% success rate allows for edge cases and system issues
- 30% cache hit rate indicates effective caching strategy
- 2+ results ensure users get actionable information

---

### 4. Database Performance SLO

**Objective:** Database operations should be fast and reliable

| Metric                      | Target  | Measurement Window |
| --------------------------- | ------- | ------------------ |
| Query P95 Latency           | < 100ms | 5 minutes          |
| Connection Success Rate     | 99.9%   | 30 days            |
| Connection Pool Utilization | < 80%   | 5 minutes          |

**SLI Definition:**

```promql
# Query Latency P95
histogram_quantile(0.95,
  sum(rate(db_query_duration_seconds_bucket[5m])) by (le)
)

# Connection Success Rate
1 - (
  sum(rate(db_connection_errors_total[5m]))
  /
  sum(rate(db_query_duration_seconds_count[5m]))
)

# Pool Utilization
sum(db_connections_total{state="in_use"})
/
(sum(db_connections_total{state="in_use"}) + sum(db_connections_total{state="idle"}))
```

**Rationale:**

- 100ms P95 ensures responsive API layer
- High connection success rate prevents cascading failures
- 80% pool utilization threshold leaves headroom for spikes

---

### 5. Cache Performance SLO

**Objective:** Cache should provide performance improvements

| Metric                  | Target          | Measurement Window |
| ----------------------- | --------------- | ------------------ |
| Overall Hit Rate        | > 40%           | 24 hours           |
| L1 Cache Hit Rate       | > 60% (of hits) | 24 hours           |
| Cache Operation Latency | P95 < 10ms      | 5 minutes          |

**SLI Definition:**

```promql
# Overall Cache Hit Rate
sum(rate(cache_hits_total[1h]))
/
(sum(rate(cache_hits_total[1h])) + sum(rate(cache_misses_total[1h])))

# L1 Hit Rate (of all cache hits)
sum(rate(cache_hits_total{cache_layer="l1"}[1h]))
/
sum(rate(cache_hits_total[1h]))

# Cache Operation Latency P95
histogram_quantile(0.95,
  sum(rate(cache_latency_seconds_bucket[5m])) by (le, cache_layer)
)
```

**Rationale:**

- 40% overall hit rate demonstrates effective caching
- 60% L1 hit rate shows hot data staying in fast cache
- Sub-10ms latency ensures cache doesn't become bottleneck

---

### 6. Document Processing SLO

**Objective:** Document uploads should complete reliably

| Metric              | Target      | Measurement Window |
| ------------------- | ----------- | ------------------ |
| Job Success Rate    | 95%         | 7 days             |
| Processing Time P95 | < 2 minutes | 7 days             |
| Queue Depth         | < 100 jobs  | 5 minutes          |

**SLI Definition:**

```promql
# Job Success Rate
sum(rate(document_processing_jobs_total{status="completed"}[1h]))
/
sum(rate(document_processing_jobs_total[1h]))

# Processing Time P95
histogram_quantile(0.95,
  sum(rate(document_processing_duration_seconds_bucket[1h])) by (le)
)

# Queue Depth
sum(arq_queue_depth)
```

**Rationale:**

- 95% success rate accounts for malformed documents and external API failures
- 2-minute P95 ensures users don't wait excessively
- Queue depth threshold prevents backlog accumulation

---

## SLO Monitoring Strategy

### Recording Rules

Prometheus recording rules pre-compute complex SLI queries:

```yaml
# /infrastructure/observability/prometheus/rules/slo_recording_rules.yml
groups:
  - name: slo_recording_rules
    interval: 30s
    rules:
      # API Availability
      - record: slo:api_availability:ratio_rate5m
        expr: |
          sum(rate(http_requests_total{status_code=~"2..|3.."}[5m]))
          /
          sum(rate(http_requests_total[5m]))

      # API Latency P95
      - record: slo:api_latency_p95:seconds
        expr: |
          histogram_quantile(0.95,
            sum(rate(http_request_duration_seconds_bucket[5m])) by (le, endpoint)
          )
```

### Alerting Rules

Alerts fire when SLOs are at risk or violated:

```yaml
# /infrastructure/observability/prometheus/rules/slo_alerts.yml
groups:
  - name: slo_alerts
    rules:
      # Critical: SLO violated
      - alert: APIAvailabilitySLOViolation
        expr: slo:api_availability:ratio_rate5m < 0.999
        for: 5m
        labels:
          severity: critical
          slo: availability
        annotations:
          summary: "API availability below SLO (99.9%)"
          description: "Current availability: {{ $value | humanizePercentage }}"

      # Warning: Error budget at risk (50% consumed)
      - alert: ErrorBudgetAtRisk
        expr: |
          (1 - slo:api_availability:ratio_rate30d) > 0.0005
        for: 15m
        labels:
          severity: warning
          slo: availability
        annotations:
          summary: "Error budget consumption > 50%"
```

### Dashboard Panels

Grafana dashboard panels for SLO tracking:

1. **Availability Overview**: Current vs target availability
2. **Error Budget**: Remaining budget and burn rate
3. **Latency Distribution**: P50/P95/P99 over time
4. **SLO Compliance**: Per-service SLO status (green/yellow/red)
5. **Error Budget Timeline**: 30-day error budget consumption

---

## SLO Review Process

### Weekly Review

- **Owner**: On-call engineer
- **Review**: Current SLO status, recent violations
- **Action**: Update runbooks if patterns emerge

### Monthly Review

- **Owner**: Engineering Manager
- **Review**:
  - SLO compliance trends
  - Error budget consumption patterns
  - SLO target appropriateness
- **Action**: Adjust targets if consistently over/under performing

### Quarterly Review

- **Owner**: Platform Team
- **Review**:
  - SLO framework effectiveness
  - New SLOs needed
  - Retired SLOs
  - Target adjustments based on user feedback
- **Action**: Update SLO document and targets

---

## Error Budget Policy

### When Error Budget is Healthy (> 50% remaining)

- ✅ Deploy new features freely
- ✅ Experiment with new technologies
- ✅ Planned maintenance allowed
- ✅ Refactoring and tech debt work

### When Error Budget is At Risk (25-50% remaining)

- ⚠️ Increase review rigor for deployments
- ⚠️ Prioritize reliability improvements
- ⚠️ Defer non-critical features
- ⚠️ Increase monitoring and alerting

### When Error Budget is Depleted (< 25% remaining)

- 🛑 Feature freeze - only reliability improvements
- 🛑 Increase on-call staffing
- 🛑 Daily SLO status reviews
- 🛑 Defer all non-critical work until budget recovers

---

## SLO Exceptions

### Planned Maintenance

- Must be announced 48 hours in advance
- Limited to 4 hours/month
- Excluded from availability SLO
- User-facing status page updated

### External Dependencies

Failures of external services are tracked separately:

- OpenAI API failures: Tracked but excluded from API availability SLO
- Nextcloud unavailability: Tracked separately
- DNS/network issues: Excluded if beyond our control

### Known Limitations

- **Cold starts**: First request after deployment may be slow
- **Cache warming**: Cache hit rate temporarily lower after cache clear
- **Large documents**: Processing time varies with document size

---

## Appendix: Prometheus Queries

### Quick SLO Status Check

```promql
# All SLOs at a glance
{__name__=~"slo:.*"}
```

### Error Budget Remaining

```promql
# 30-day error budget remaining (as percentage)
1 - (
  (1 - slo:api_availability:ratio_rate30d) / 0.001
)
```

### SLO Burn Rate

```promql
# How fast are we consuming error budget?
# > 1.0 means consuming faster than sustainable
(1 - slo:api_availability:ratio_rate1h)
/
(1 - slo:api_availability:ratio_rate30d)
```

---

## References

- [Google SRE Book - SLOs](https://sre.google/sre-book/service-level-objectives/)
- [Prometheus Best Practices - Recording Rules](https://prometheus.io/docs/practices/rules/)
- [Grafana SLO Tracking](https://grafana.com/docs/grafana/latest/panels-visualizations/visualizations/stat/)

## Contact

For SLO-related questions:

- **Slack**: #platform-sre
- **On-call**: PagerDuty escalation
- **Documentation**: This file and `/docs/operations/RUNBOOKS.md`
6:["slug","operations/SLO_DEFINITIONS","c"]
0:["X7oMT3VrOffzp0qvbeOas",[[["",{"children":["docs",{"children":[["slug","operations/SLO_DEFINITIONS","c"],{"children":["__PAGE__?{\"slug\":[\"operations\",\"SLO_DEFINITIONS\"]}",{}]}]}]},"$undefined","$undefined",true],["",{"children":["docs",{"children":[["slug","operations/SLO_DEFINITIONS","c"],{"children":["__PAGE__",{},[["$L1",["$","div",null,{"children":[["$","div",null,{"className":"mb-6 flex items-center justify-between gap-4","children":[["$","div",null,{"children":[["$","p",null,{"className":"text-sm text-gray-500 dark:text-gray-400","children":"Docs / Raw"}],["$","h1",null,{"className":"text-3xl font-bold text-gray-900 dark:text-white","children":"Service Level Objectives (SLOs)"}],["$","p",null,{"className":"text-sm text-gray-600 dark:text-gray-400","children":["Sourced from"," ",["$","code",null,{"className":"font-mono text-xs","children":["docs/","operations/SLO_DEFINITIONS.md"]}]]}]]}],["$","a",null,{"href":"https://github.com/mohammednazmy/VoiceAssist/edit/main/docs/operations/SLO_DEFINITIONS.md","target":"_blank","rel":"noreferrer","className":"inline-flex items-center gap-2 rounded-md border border-gray-200 dark:border-gray-700 px-3 py-1.5 text-sm text-gray-700 dark:text-gray-200 hover:border-primary-500 dark:hover:border-primary-400 hover:text-primary-700 dark:hover:text-primary-300","children":"Edit on GitHub"}]]}],["$","div",null,{"className":"rounded-lg border border-gray-200 dark:border-gray-800 bg-white dark:bg-gray-900 p-6","children":["$","$L2",null,{"content":"$3"}]}],["$","div",null,{"className":"mt-6 flex flex-wrap gap-2 text-sm","children":[["$","$L4",null,{"href":"/reference/all-docs","className":"inline-flex items-center gap-1 rounded-md bg-gray-100 px-3 py-1 text-gray-700 hover:bg-gray-200 dark:bg-gray-800 dark:text-gray-200 dark:hover:bg-gray-700","children":"← All documentation"}],["$","$L4",null,{"href":"/","className":"inline-flex items-center gap-1 rounded-md bg-gray-100 px-3 py-1 text-gray-700 hover:bg-gray-200 dark:bg-gray-800 dark:text-gray-200 dark:hover:bg-gray-700","children":"Home"}]]}]]}],null],null],null]},[null,["$","$L5",null,{"parallelRouterKey":"children","segmentPath":["children","docs","children","$6","children"],"error":"$undefined","errorStyles":"$undefined","errorScripts":"$undefined","template":["$","$L7",null,{}],"templateStyles":"$undefined","templateScripts":"$undefined","notFound":"$undefined","notFoundStyles":"$undefined"}]],null]},[null,["$","$L5",null,{"parallelRouterKey":"children","segmentPath":["children","docs","children"],"error":"$undefined","errorStyles":"$undefined","errorScripts":"$undefined","template":["$","$L7",null,{}],"templateStyles":"$undefined","templateScripts":"$undefined","notFound":"$undefined","notFoundStyles":"$undefined"}]],null]},[[[["$","link","0",{"rel":"stylesheet","href":"/_next/static/css/7f586cdbbaa33ff7.css","precedence":"next","crossOrigin":"$undefined"}]],["$","html",null,{"lang":"en","className":"h-full","children":["$","body",null,{"className":"__className_f367f3 h-full bg-white dark:bg-gray-900","children":[["$","a",null,{"href":"#main-content","className":"skip-to-content","children":"Skip to main content"}],["$","$L8",null,{"children":[["$","$L9",null,{}],["$","$La",null,{}],["$","main",null,{"id":"main-content","className":"lg:pl-64","role":"main","aria-label":"Documentation content","children":["$","$Lb",null,{"children":["$","$L5",null,{"parallelRouterKey":"children","segmentPath":["children"],"error":"$undefined","errorStyles":"$undefined","errorScripts":"$undefined","template":["$","$L7",null,{}],"templateStyles":"$undefined","templateScripts":"$undefined","notFound":[["$","title",null,{"children":"404: This page could not be found."}],["$","div",null,{"style":{"fontFamily":"system-ui,\"Segoe UI\",Roboto,Helvetica,Arial,sans-serif,\"Apple Color Emoji\",\"Segoe UI Emoji\"","height":"100vh","textAlign":"center","display":"flex","flexDirection":"column","alignItems":"center","justifyContent":"center"},"children":["$","div",null,{"children":[["$","style",null,{"dangerouslySetInnerHTML":{"__html":"body{color:#000;background:#fff;margin:0}.next-error-h1{border-right:1px solid rgba(0,0,0,.3)}@media (prefers-color-scheme:dark){body{color:#fff;background:#000}.next-error-h1{border-right:1px solid rgba(255,255,255,.3)}}"}}],["$","h1",null,{"className":"next-error-h1","style":{"display":"inline-block","margin":"0 20px 0 0","padding":"0 23px 0 0","fontSize":24,"fontWeight":500,"verticalAlign":"top","lineHeight":"49px"},"children":"404"}],["$","div",null,{"style":{"display":"inline-block"},"children":["$","h2",null,{"style":{"fontSize":14,"fontWeight":400,"lineHeight":"49px","margin":0},"children":"This page could not be found."}]}]]}]}]],"notFoundStyles":[]}]}]}]]}]]}]}]],null],null],["$Lc",null]]]]
c:[["$","meta","0",{"name":"viewport","content":"width=device-width, initial-scale=1"}],["$","meta","1",{"charSet":"utf-8"}],["$","title","2",{"children":"Service Level Objectives (SLOs) | Docs | VoiceAssist Docs"}],["$","meta","3",{"name":"description","content":"Reliability targets balancing user expectations with engineering effort."}],["$","meta","4",{"name":"keywords","content":"VoiceAssist,documentation,medical AI,voice assistant,healthcare,HIPAA,API"}],["$","meta","5",{"name":"robots","content":"index, follow"}],["$","meta","6",{"name":"googlebot","content":"index, follow"}],["$","link","7",{"rel":"canonical","href":"https://assistdocs.asimo.io"}],["$","meta","8",{"property":"og:title","content":"VoiceAssist Documentation"}],["$","meta","9",{"property":"og:description","content":"Comprehensive documentation for VoiceAssist - Enterprise Medical AI Assistant"}],["$","meta","10",{"property":"og:url","content":"https://assistdocs.asimo.io"}],["$","meta","11",{"property":"og:site_name","content":"VoiceAssist Docs"}],["$","meta","12",{"property":"og:type","content":"website"}],["$","meta","13",{"name":"twitter:card","content":"summary"}],["$","meta","14",{"name":"twitter:title","content":"VoiceAssist Documentation"}],["$","meta","15",{"name":"twitter:description","content":"Comprehensive documentation for VoiceAssist - Enterprise Medical AI Assistant"}],["$","meta","16",{"name":"next-size-adjust"}]]
1:null