PHI-Aware STT Routing
Voice Mode v4.1 introduces PHI-aware speech-to-text routing to ensure Protected Health Information remains on-premises when required for HIPAA compliance.
Overview
The PHI-aware STT router intelligently routes audio based on content sensitivity:
┌─────────────────────────────────────────────────────────────────┐
│ Audio Input │
├─────────────────────────────────────────────────────────────────┤
│ │
│ ┌──────────────┐ ┌──────────────────┐ │
│ │ PHI Detector │────▶│ Sensitivity Score │ │
│ └──────────────┘ └──────────────────┘ │
│ │ │
│ ┌───────────────┼───────────────┐ │
│ ▼ ▼ ▼ │
│ Score < 0.3 0.3 ≤ Score < 0.7 Score ≥ 0.7 │
│ │ │ │ │
│ ▼ ▼ ▼ │
│ ┌────────────┐ ┌────────────┐ ┌────────────┐ │
│ │ Cloud STT │ │ Hybrid Mode│ │Local Whisper│ │
│ │(OpenAI/GCP)│ │ (Redacted) │ │ (On-Prem) │ │
│ └────────────┘ └────────────┘ └────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────┘
Thinker-Talker Pipeline Integration
sequenceDiagram participant User participant Frontend participant VoicePipeline participant PHIRouter participant Thinker as Thinker (LLM) participant Talker as Talker (TTS) participant Telemetry User->>Frontend: Speaks audio Frontend->>VoicePipeline: Audio stream VoicePipeline->>PHIRouter: route(audio_context) Note over PHIRouter: PHI Detection & Scoring PHIRouter->>Telemetry: update_routing_state() Telemetry-->>Frontend: PHI mode indicator (🛡️/🔒/☁️) alt PHI Score >= 0.7 PHIRouter->>VoicePipeline: route="local" Note over VoicePipeline: Use Local Whisper else PHI Score 0.3-0.7 PHIRouter->>VoicePipeline: route="hybrid" Note over VoicePipeline: Use Cloud + Redaction else PHI Score < 0.3 PHIRouter->>VoicePipeline: route="cloud" Note over VoicePipeline: Use Cloud STT end VoicePipeline->>Thinker: transcript + context Thinker-->>VoicePipeline: response_stream VoicePipeline->>Talker: text_stream Talker-->>Frontend: audio_chunks Frontend-->>User: Plays response
Routing Priority Order
flowchart TD A[Audio Input] --> B{Session has prior PHI?} B -->|Yes| L[LOCAL<br/>🛡️ On-device Whisper] B -->|No| C{PHI Score >= 0.7?} C -->|Yes| L C -->|No| D{PHI Score >= 0.3?} D -->|Yes| H[HYBRID<br/>🔒 Cloud + Redaction] D -->|No| E{Medical Context?} E -->|Yes| H E -->|No| CL[CLOUD<br/>☁️ Standard STT] L --> T[Thinker-Talker Pipeline] H --> T CL --> T style L fill:#90EE90 style H fill:#FFE4B5 style CL fill:#ADD8E6
PHI Detection
Detection Signals
The PHI detector analyzes multiple signals to score content sensitivity:
| Signal | Weight | Examples |
|---|---|---|
| Medical entity detection | 0.4 | "My doctor said...", "I take metformin" |
| Personal identifiers | 0.3 | Names, DOB, SSN patterns |
| Appointment context | 0.2 | "My appointment at...", "Dr. Smith" |
| Session history | 0.1 | Previous PHI in conversation |
Sensitivity Scores
| Score Range | Classification | Routing Decision |
|---|---|---|
| 0.0 - 0.29 | General | Cloud STT (fastest) |
| 0.3 - 0.69 | Potentially Sensitive | Hybrid mode (redacted) |
| 0.7 - 1.0 | PHI Detected | Local Whisper (secure) |
Routing Strategies
1. Cloud STT (Default)
For general queries with no PHI indicators:
from app.services.phi_stt_router import PHISTTRouter router = PHISTTRouter() # General query - routes to cloud result = await router.transcribe( audio_data=audio_bytes, session_id="session_123" ) # result.provider = "openai_whisper" # result.phi_score = 0.15 # result.routing = "cloud"
2. Local Whisper (Secure)
For queries with high PHI probability:
# PHI detected - routes to local Whisper result = await router.transcribe( audio_data=audio_bytes, session_id="session_123", context={"has_prior_phi": True} # Session context ) # result.provider = "local_whisper" # result.phi_score = 0.85 # result.routing = "local" # result.phi_entities = ["medication", "condition"]
3. Hybrid Mode (Redacted)
For borderline cases, audio is processed with entity redaction:
# Borderline - uses hybrid with redaction result = await router.transcribe( audio_data=audio_bytes, session_id="session_123" ) # result.provider = "openai_whisper_redacted" # result.phi_score = 0.45 # result.routing = "hybrid" # result.redacted_entities = ["name", "date"]
Configuration
Environment Variables
# Enable PHI-aware routing VOICE_V4_PHI_ROUTING=true # Local Whisper model path WHISPER_MODEL_PATH=/opt/voiceassist/models/whisper-large-v3 WHISPER_MODEL_SIZE=large-v3 # Cloud STT provider (fallback) STT_PROVIDER=openai # openai, google, azure # PHI detection thresholds PHI_THRESHOLD_LOCAL=0.7 PHI_THRESHOLD_HYBRID=0.3 # Session context window (for PHI history) PHI_SESSION_CONTEXT_WINDOW=10 # messages
Feature Flag
# Check if PHI routing is enabled from app.core.feature_flags import feature_flag_service if await feature_flag_service.is_enabled("backend.voice_v4_phi_routing"): router = PHISTTRouter() else: router = StandardSTTRouter()
Local Whisper Setup
Installation
# Install faster-whisper (optimized inference) pip install faster-whisper # Download model python -c " from faster_whisper import WhisperModel model = WhisperModel('large-v3', device='cuda', compute_type='float16') print('Model downloaded successfully') "
Model Options
| Model | Size | VRAM | RTF* | Quality |
|---|---|---|---|---|
| tiny | 39 MB | 1 GB | 0.03 | Basic |
| base | 74 MB | 1 GB | 0.05 | Good |
| small | 244 MB | 2 GB | 0.08 | Better |
| medium | 769 MB | 5 GB | 0.15 | Great |
| large-v3 | 1.5 GB | 10 GB | 0.25 | Best |
*Real-time factor (lower is faster)
GPU Requirements
- Minimum: NVIDIA GPU with 4GB VRAM (small model)
- Recommended: NVIDIA GPU with 10GB VRAM (large-v3)
- CPU Fallback: Available but 5-10x slower
UI Integration
PHI Indicator Component
import { PHIIndicator } from "@/components/voice/PHIIndicator"; <PHIIndicator routing={result.routing} // "cloud" | "hybrid" | "local" phiScore={result.phi_score} showDetails={true} />;
Visual States
| Routing | Icon | Color | Tooltip |
|---|---|---|---|
| cloud | ☁️ | Blue | "Using cloud transcription" |
| hybrid | 🔒 | Yellow | "Sensitive content detected" |
| local | 🛡️ | Green | "Secure local processing" |
Subscribing to PHI Routing Updates (Frontend)
The PHITelemetryService provides real-time PHI routing state to the frontend via WebSocket events and a polling API.
Option 1: WebSocket Subscription
import { useEffect, useState } from "react"; import { useWebSocket } from "@/hooks/useWebSocket"; interface PHIState { sessionId: string; phiMode: "local" | "hybrid" | "cloud"; phiScore: number; isSecureMode: boolean; hasPriorPhi: boolean; indicatorColor: "green" | "yellow" | "blue"; indicatorIcon: "shield" | "lock" | "cloud"; tooltip: string; } function usePHIRoutingState(sessionId: string) { const [phiState, setPHIState] = useState<PHIState | null>(null); const { subscribe, unsubscribe } = useWebSocket(); useEffect(() => { // Subscribe to PHI telemetry events const handlePHIEvent = (event: { type: string; data: PHIState }) => { if (event.type === "phi.routing_decision" || event.type === "phi.mode_change") { setPHIState(event.data); } }; subscribe(`phi.${sessionId}`, handlePHIEvent); return () => unsubscribe(`phi.${sessionId}`, handlePHIEvent); }, [sessionId, subscribe, unsubscribe]); return phiState; }
Option 2: REST API Polling
// GET /api/voice/phi-state/{session_id} // Returns current PHI routing state for the session async function fetchPHIState(sessionId: string): Promise<PHIState> { const response = await fetch(`/api/voice/phi-state/${sessionId}`); return response.json(); } // Example usage in a component function PHIIndicator({ sessionId }: { sessionId: string }) { const [state, setState] = useState<PHIState | null>(null); useEffect(() => { const interval = setInterval(async () => { const newState = await fetchPHIState(sessionId); setState(newState); }, 1000); // Poll every second return () => clearInterval(interval); }, [sessionId]); if (!state) return null; return ( <div className={`phi-indicator phi-${state.indicatorColor}`}> <span className="icon">{getIcon(state.indicatorIcon)}</span> <span className="tooltip">{state.tooltip}</span> </div> ); }
Backend API for Frontend State
# In your FastAPI router from app.services.phi_stt_router import get_phi_stt_router @router.get("/api/voice/phi-state/{session_id}") async def get_phi_state(session_id: str): """Get current PHI routing state for frontend indicator.""" router = get_phi_stt_router() state = router.get_frontend_state(session_id) if state is None: raise HTTPException(404, "Session not found") return state
Telemetry Event Types
| Event Type | Description | Payload |
|---|---|---|
phi.routing_decision | New routing decision made | Full PHI state + previous mode |
phi.mode_change | PHI mode changed (e.g., cloud → local) | From/to modes, reason |
phi.phi_detected | PHI entities detected in audio | Score, entity types |
phi.session_start | New PHI session initialized | Initial state |
phi.session_end | PHI session ended | Final mode, had PHI flag |
Audit Logging
All PHI routing decisions are logged for compliance:
logger.info("PHI routing decision", extra={ "session_id": session_id, "phi_score": 0.85, "routing_decision": "local", "detection_signals": ["medication_mention", "condition_name"], "provider": "local_whisper", "processing_time_ms": 234, "model": "whisper-large-v3" })
Prometheus Metrics
# Routing distribution stt_routing_total.labels(routing="local").inc() stt_routing_total.labels(routing="cloud").inc() stt_routing_total.labels(routing="hybrid").inc() # PHI detection accuracy phi_detection_score_histogram.observe(phi_score) # Latency by routing type stt_latency_ms.labels(routing="local").observe(234)
Testing
Unit Tests
@pytest.mark.asyncio async def test_phi_routing_high_score(): """High PHI score routes to local Whisper.""" router = PHISTTRouter() # Mock audio with PHI content audio = generate_test_audio("I take metformin for my diabetes") result = await router.transcribe(audio) assert result.routing == "local" assert result.phi_score >= 0.7 assert result.provider == "local_whisper" @pytest.mark.asyncio async def test_phi_routing_low_score(): """Low PHI score routes to cloud.""" router = PHISTTRouter() # Mock audio without PHI audio = generate_test_audio("What is the weather today?") result = await router.transcribe(audio) assert result.routing == "cloud" assert result.phi_score < 0.3
Integration Tests
# Run PHI routing tests pytest tests/services/test_phi_stt_router.py -v # Test with real audio samples pytest tests/integration/test_phi_routing_e2e.py -v --audio-samples ./test_audio/
Best Practices
- Default to local for medical context: If session involves health topics, bias toward local processing
- Cache PHI decisions per session: Avoid re-evaluating the same session repeatedly
- Monitor latency impact: Local Whisper adds ~200ms; account for this in latency budgets
- Regular model updates: Update Whisper model quarterly for accuracy improvements
- Audit trail: Maintain logs of all routing decisions for compliance audits