Adaptive Quality Service

Phase 3 - Voice Mode v4.1

Dynamic voice processing quality management based on network conditions and system load.

Overview

The Adaptive Quality Service monitors network performance and system load in real-time, automatically adjusting voice processing quality to maintain optimal user experience within latency budgets.

┌─────────────────────────────────────────────────────────────────┐
│                    Adaptive Quality Service                      │
│                                                                  │
│  Network Metrics ───▶ ┌─────────────┐ ───▶ Quality Level        │
│  (RTT, bandwidth,     │  Quality    │      (ULTRA→MINIMAL)      │
│   packet loss)        │  Adjuster   │                           │
│                       └─────────────┘                           │
│                              │                                   │
│  Latency Budget ────────────▶│◀─────── User Preferences        │
│  (per component)             │                                   │
│                              ▼                                   │
│                       ┌─────────────┐                           │
│                       │  Settings   │                           │
│                       │  Generator  │                           │
│                       └─────────────┘                           │
│                              │                                   │
│                              ▼                                   │
│              STT Model, TTS Model, Bitrate, Features            │
└─────────────────────────────────────────────────────────────────┘

Features

5 Quality Levels: ULTRA, HIGH, MEDIUM, LOW, MINIMAL
Network Monitoring: RTT, bandwidth, packet loss, jitter
Latency Budgets: Per-component budget tracking
Graceful Degradation: Automatic quality reduction
Load Testing: Built-in test utilities
Hysteresis: Prevents quality flapping

Quality Levels

Level	Target Latency	STT Model	TTS Model	Features
ULTRA	800ms	whisper-large-v3	eleven_turbo_v2	All enabled
HIGH	600ms	whisper-1	eleven_turbo_v2	All enabled
MEDIUM	500ms	whisper-1	tts-1	Sentiment, Language
LOW	400ms	whisper-1	tts-1	None
MINIMAL	300ms	whisper-1	tts-1	None, reduced tokens

Detailed Settings per Level

QUALITY_PRESETS = {
    QualityLevel.ULTRA: QualitySettings(
        stt_model="whisper-large-v3",
        tts_model="eleven_turbo_v2",
        audio_bitrate_kbps=128,
        sample_rate_hz=48000,
        max_context_tokens=8000,
        max_response_tokens=2000,
        enable_speaker_diarization=True,
        enable_sentiment_analysis=True,
        enable_language_detection=True,
    ),
    QualityLevel.HIGH: QualitySettings(
        stt_model="whisper-1",
        tts_model="eleven_turbo_v2",
        audio_bitrate_kbps=96,
        sample_rate_hz=24000,
        max_context_tokens=6000,
        max_response_tokens=1500,
        ...
    ),
    # ... and so on
}

Network Conditions

Condition	RTT	Bandwidth	Packet Loss	Auto Level
EXCELLENT	<50ms	>10 Mbps	<0.1%	ULTRA
GOOD	50-150ms	2-10 Mbps	<1%	HIGH
FAIR	150-300ms	0.5-2 Mbps	<5%	MEDIUM
POOR	300-500ms	<0.5 Mbps	<10%	LOW
CRITICAL	>500ms	Very low	>10%	MINIMAL

Feature Flag

# flag_definitions.yaml
backend.voice_v4_adaptive_quality:
  default: false
  description: "Enable adaptive quality management"

Basic Usage

Initialize Session

from app.services.adaptive_quality_service import (
    get_adaptive_quality_service,
    QualityLevel
)

service = get_adaptive_quality_service()
await service.initialize()

# Start session with initial quality
state = await service.init_session(
    session_id="voice-123",
    initial_level=QualityLevel.HIGH,
    user_preference=QualityLevel.MEDIUM  # Optional override
)

print(f"Quality: {state.current_level.value}")
print(f"Target latency: {state.current_settings.target_latency_ms}ms")

Update Network Metrics

from app.services.adaptive_quality_service import NetworkMetrics

# Measure network conditions
metrics = NetworkMetrics(
    rtt_ms=150,
    bandwidth_kbps=5000,
    packet_loss_pct=0.5,
    jitter_ms=15
)

# Update service (may trigger quality change)
state = await service.update_network_metrics("voice-123", metrics)

print(f"Network: {state.network_condition.value}")
print(f"Quality: {state.current_level.value}")

Record Component Latency

# Track latency for budget monitoring
budget = service.record_latency("voice-123", "stt", latency_ms=180)
budget = service.record_latency("voice-123", "llm", latency_ms=250)
budget = service.record_latency("voice-123", "tts", latency_ms=120)

print(f"Total: {budget.total_actual_ms}ms / {budget.total_budget_ms}ms")
print(f"Exceeded: {budget.is_exceeded}")

Get Current Settings

settings = service.get_current_settings("voice-123")

# Use settings in voice pipeline
stt_response = await stt_service.transcribe(
    audio=audio_data,
    model=settings.stt_model,
    sample_rate=settings.sample_rate_hz
)

tts_response = await tts_service.synthesize(
    text=response_text,
    model=settings.tts_model,
    bitrate=settings.audio_bitrate_kbps
)

Latency Budget System

Budget Allocation

Total Budget (e.g., 600ms for HIGH)
├── STT: 25% (150ms)
├── LLM: 35% (210ms)
├── TTS: 25% (150ms)
└── Network: 15% (90ms)

LatencyBudget Class

@dataclass
class LatencyBudget:
    total_budget_ms: int
    stt_budget_ms: int
    llm_budget_ms: int
    tts_budget_ms: int
    network_budget_ms: int

    # Actual measurements
    stt_actual_ms: float = 0
    llm_actual_ms: float = 0
    tts_actual_ms: float = 0
    network_actual_ms: float = 0

    @property
    def is_exceeded(self) -> bool:
        return self.total_actual_ms > self.total_budget_ms

Automatic Degradation

When latency budget is exceeded:

# In voice pipeline
budget = service.record_latency(session_id, "llm", 350)  # Over budget

if budget.is_exceeded:
    # Service automatically triggers degradation
    # Quality: HIGH → MEDIUM
    # New budget: 600ms → 500ms

Quality Change Callbacks

def on_quality_change(state: QualityState, event: DegradationEvent):
    print(f"Quality changed: {event.from_level} → {event.to_level}")
    print(f"Reason: {event.reason}")

    # Update UI
    send_to_frontend({
        "type": "quality_change",
        "level": state.current_level.value,
        "reason": event.reason
    })

service.on_quality_change(on_quality_change)

Hysteresis Logic

The service prevents quality flapping with hysteresis:

# Downgrade: Requires 2+ poor samples in last 3
def _should_downgrade(history):
    recent = history[-3:]
    poor_count = sum(1 for m in recent if m.condition in ["poor", "critical"])
    return poor_count >= 2

# Upgrade: Requires 4+ good samples in last 5
def _should_upgrade(history):
    recent = history[-5:]
    if len(recent) < 5:
        return False
    good_count = sum(1 for m in recent if m.condition in ["excellent", "good"])
    return good_count >= 4

Load Testing

Concurrent Session Test

from app.services.adaptive_quality_service import get_load_test_runner

runner = get_load_test_runner()

result = await runner.run_concurrent_session_test(
    num_sessions=50,
    duration_seconds=120,
    requests_per_second=10
)

print(f"Success rate: {result.success_rate}%")
print(f"P95 latency: {result.p95_latency_ms}ms")
print(f"Degradations: {result.degradations_triggered}")

Degradation Behavior Test

# Test quality degradation under poor network
events = await runner.run_degradation_test(
    session_id="test-session",
    simulate_poor_network=True
)

for event in events:
    print(f"{event.from_level} → {event.to_level}: {event.reason}")

LoadTestResult

@dataclass
class LoadTestResult:
    test_name: str
    concurrent_sessions: int
    duration_seconds: float
    total_requests: int
    successful_requests: int
    failed_requests: int
    avg_latency_ms: float
    p50_latency_ms: float
    p95_latency_ms: float
    p99_latency_ms: float
    degradations_triggered: int

    @property
    def success_rate(self) -> float:
        return self.successful_requests / self.total_requests * 100

Override Thresholds

Custom Network Thresholds

# Define custom condition thresholds
class CustomNetworkMetrics(NetworkMetrics):
    @property
    def condition(self) -> NetworkCondition:
        # Stricter thresholds for healthcare
        if self.rtt_ms < 30 and self.packet_loss_pct < 0.05:
            return NetworkCondition.EXCELLENT
        elif self.rtt_ms < 100 and self.packet_loss_pct < 0.5:
            return NetworkCondition.GOOD
        # ... etc

Custom Quality Presets

# Override preset settings
custom_presets = QUALITY_PRESETS.copy()
custom_presets[QualityLevel.HIGH] = QualitySettings(
    level=QualityLevel.HIGH,
    stt_model="whisper-1",
    tts_model="eleven_multilingual_v2",  # Custom TTS
    target_latency_ms=500,  # Tighter budget
    # ... other settings
)

Frontend Integration

QualityBadge Component

interface QualityBadgeProps {
  level: "ultra" | "high" | "medium" | "low" | "minimal";
  showLabel?: boolean;
}

function QualityBadge({ level, showLabel = true }: QualityBadgeProps) {
  const colors = {
    ultra: "bg-purple-500",
    high: "bg-green-500",
    medium: "bg-yellow-500",
    low: "bg-orange-500",
    minimal: "bg-red-500",
  };

  return (
    <div className={cn("flex items-center gap-1.5 px-2 py-1 rounded-full text-white text-xs", colors[level])}>
      <span className="w-2 h-2 rounded-full bg-white/50" />
      {showLabel && <span className="uppercase font-medium">{level}</span>}
    </div>
  );
}

Real-time Quality Updates

function useQualityState(sessionId: string) {
  const [quality, setQuality] = useState<QualityState | null>(null);

  useEffect(() => {
    const es = new EventSource(`/api/voice/${sessionId}/quality`);

    es.onmessage = (event) => {
      setQuality(JSON.parse(event.data));
    };

    return () => es.close();
  }, [sessionId]);

  return quality;
}

Metrics and Monitoring

Prometheus Metrics

# Exposed metrics
voice_quality_level{session_id, level}
voice_latency_budget_exceeded{session_id}
voice_degradation_total{from_level, to_level, reason}
voice_network_condition{session_id, condition}

Logging

# Quality change log
logger.info(
    "Quality level changed",
    extra={
        "session_id": session_id,
        "from_level": old_level,
        "to_level": new_level,
        "reason": reason,
        "network_condition": condition,
    }
)

Best Practices

Initialize early: Call init_session at voice mode start
Update frequently: Send network metrics every 5-10 seconds
Record all latencies: Track STT, LLM, TTS, and network
Handle callbacks: Update UI when quality changes
Clean up: Call end_session when voice mode ends
Test degradation: Use load tests before deployment

Adaptive Quality Service

Adaptive Quality Service

Overview

Features

Quality Levels

Detailed Settings per Level

Network Conditions

Feature Flag

Basic Usage

Initialize Session

Update Network Metrics

Record Component Latency

Get Current Settings

Latency Budget System

Budget Allocation

LatencyBudget Class

Automatic Degradation

Quality Change Callbacks

Hysteresis Logic

Load Testing

Concurrent Session Test

Degradation Behavior Test

LoadTestResult

Override Thresholds

Custom Network Thresholds

Custom Quality Presets

Frontend Integration

QualityBadge Component

Real-time Quality Updates

Metrics and Monitoring

Prometheus Metrics

Logging

Best Practices

Related Documentation