PHI-Aware STT Routing

Voice Mode v4.1 introduces PHI-aware speech-to-text routing to ensure Protected Health Information remains on-premises when required for HIPAA compliance.

Overview

The PHI-aware STT router intelligently routes audio based on content sensitivity:

┌─────────────────────────────────────────────────────────────────┐
│                      Audio Input                                 │
├─────────────────────────────────────────────────────────────────┤
│                                                                  │
│  ┌──────────────┐     ┌──────────────────┐                      │
│  │ PHI Detector │────▶│ Sensitivity Score │                     │
│  └──────────────┘     └──────────────────┘                      │
│                              │                                   │
│              ┌───────────────┼───────────────┐                  │
│              ▼               ▼               ▼                  │
│        Score < 0.3     0.3 ≤ Score < 0.7   Score ≥ 0.7         │
│              │               │               │                  │
│              ▼               ▼               ▼                  │
│     ┌────────────┐   ┌────────────┐   ┌────────────┐           │
│     │ Cloud STT  │   │ Hybrid Mode│   │Local Whisper│           │
│     │(OpenAI/GCP)│   │  (Redacted) │   │ (On-Prem)  │           │
│     └────────────┘   └────────────┘   └────────────┘           │
│                                                                  │
└─────────────────────────────────────────────────────────────────┘

Thinker-Talker Pipeline Integration

sequenceDiagram
    participant User
    participant Frontend
    participant VoicePipeline
    participant PHIRouter
    participant Thinker as Thinker (LLM)
    participant Talker as Talker (TTS)
    participant Telemetry

    User->>Frontend: Speaks audio
    Frontend->>VoicePipeline: Audio stream
    VoicePipeline->>PHIRouter: route(audio_context)

    Note over PHIRouter: PHI Detection & Scoring
    PHIRouter->>Telemetry: update_routing_state()
    Telemetry-->>Frontend: PHI mode indicator (🛡️/🔒/☁️)

    alt PHI Score >= 0.7
        PHIRouter->>VoicePipeline: route="local"
        Note over VoicePipeline: Use Local Whisper
    else PHI Score 0.3-0.7
        PHIRouter->>VoicePipeline: route="hybrid"
        Note over VoicePipeline: Use Cloud + Redaction
    else PHI Score < 0.3
        PHIRouter->>VoicePipeline: route="cloud"
        Note over VoicePipeline: Use Cloud STT
    end

    VoicePipeline->>Thinker: transcript + context
    Thinker-->>VoicePipeline: response_stream
    VoicePipeline->>Talker: text_stream
    Talker-->>Frontend: audio_chunks
    Frontend-->>User: Plays response

Routing Priority Order

flowchart TD
    A[Audio Input] --> B{Session has prior PHI?}
    B -->|Yes| L[LOCAL<br/>🛡️ On-device Whisper]
    B -->|No| C{PHI Score >= 0.7?}
    C -->|Yes| L
    C -->|No| D{PHI Score >= 0.3?}
    D -->|Yes| H[HYBRID<br/>🔒 Cloud + Redaction]
    D -->|No| E{Medical Context?}
    E -->|Yes| H
    E -->|No| CL[CLOUD<br/>☁️ Standard STT]

    L --> T[Thinker-Talker Pipeline]
    H --> T
    CL --> T

    style L fill:#90EE90
    style H fill:#FFE4B5
    style CL fill:#ADD8E6

PHI Detection

Detection Signals

The PHI detector analyzes multiple signals to score content sensitivity:

Signal	Weight	Examples
Medical entity detection	0.4	"My doctor said...", "I take metformin"
Personal identifiers	0.3	Names, DOB, SSN patterns
Appointment context	0.2	"My appointment at...", "Dr. Smith"
Session history	0.1	Previous PHI in conversation

Sensitivity Scores

Score Range	Classification	Routing Decision
0.0 - 0.29	General	Cloud STT (fastest)
0.3 - 0.69	Potentially Sensitive	Hybrid mode (redacted)
0.7 - 1.0	PHI Detected	Local Whisper (secure)

Routing Strategies

1. Cloud STT (Default)

For general queries with no PHI indicators:

from app.services.phi_stt_router import PHISTTRouter

router = PHISTTRouter()

# General query - routes to cloud
result = await router.transcribe(
    audio_data=audio_bytes,
    session_id="session_123"
)

# result.provider = "openai_whisper"
# result.phi_score = 0.15
# result.routing = "cloud"

2. Local Whisper (Secure)

For queries with high PHI probability:

# PHI detected - routes to local Whisper
result = await router.transcribe(
    audio_data=audio_bytes,
    session_id="session_123",
    context={"has_prior_phi": True}  # Session context
)

# result.provider = "local_whisper"
# result.phi_score = 0.85
# result.routing = "local"
# result.phi_entities = ["medication", "condition"]

3. Hybrid Mode (Redacted)

For borderline cases, audio is processed with entity redaction:

# Borderline - uses hybrid with redaction
result = await router.transcribe(
    audio_data=audio_bytes,
    session_id="session_123"
)

# result.provider = "openai_whisper_redacted"
# result.phi_score = 0.45
# result.routing = "hybrid"
# result.redacted_entities = ["name", "date"]

Configuration

Environment Variables

# Enable PHI-aware routing
VOICE_V4_PHI_ROUTING=true

# Local Whisper model path
WHISPER_MODEL_PATH=/opt/voiceassist/models/whisper-large-v3
WHISPER_MODEL_SIZE=large-v3

# Cloud STT provider (fallback)
STT_PROVIDER=openai  # openai, google, azure

# PHI detection thresholds
PHI_THRESHOLD_LOCAL=0.7
PHI_THRESHOLD_HYBRID=0.3

# Session context window (for PHI history)
PHI_SESSION_CONTEXT_WINDOW=10  # messages

Feature Flag

# Check if PHI routing is enabled
from app.core.feature_flags import feature_flag_service

if await feature_flag_service.is_enabled("backend.voice_v4_phi_routing"):
    router = PHISTTRouter()
else:
    router = StandardSTTRouter()

Local Whisper Setup

Installation

# Install faster-whisper (optimized inference)
pip install faster-whisper

# Download model
python -c "
from faster_whisper import WhisperModel
model = WhisperModel('large-v3', device='cuda', compute_type='float16')
print('Model downloaded successfully')
"

Model Options

Model	Size	VRAM	RTF*	Quality
tiny	39 MB	1 GB	0.03	Basic
base	74 MB	1 GB	0.05	Good
small	244 MB	2 GB	0.08	Better
medium	769 MB	5 GB	0.15	Great
large-v3	1.5 GB	10 GB	0.25	Best

*Real-time factor (lower is faster)

GPU Requirements

Minimum: NVIDIA GPU with 4GB VRAM (small model)
Recommended: NVIDIA GPU with 10GB VRAM (large-v3)
CPU Fallback: Available but 5-10x slower

UI Integration

PHI Indicator Component

import { PHIIndicator } from "@/components/voice/PHIIndicator";

<PHIIndicator
  routing={result.routing} // "cloud" | "hybrid" | "local"
  phiScore={result.phi_score}
  showDetails={true}
/>;

Visual States

Routing	Icon	Color	Tooltip
cloud	☁️	Blue	"Using cloud transcription"
hybrid	🔒	Yellow	"Sensitive content detected"
local	🛡️	Green	"Secure local processing"

Subscribing to PHI Routing Updates (Frontend)

The PHITelemetryService provides real-time PHI routing state to the frontend via WebSocket events and a polling API.

Option 1: WebSocket Subscription

import { useEffect, useState } from "react";
import { useWebSocket } from "@/hooks/useWebSocket";

interface PHIState {
  sessionId: string;
  phiMode: "local" | "hybrid" | "cloud";
  phiScore: number;
  isSecureMode: boolean;
  hasPriorPhi: boolean;
  indicatorColor: "green" | "yellow" | "blue";
  indicatorIcon: "shield" | "lock" | "cloud";
  tooltip: string;
}

function usePHIRoutingState(sessionId: string) {
  const [phiState, setPHIState] = useState<PHIState | null>(null);
  const { subscribe, unsubscribe } = useWebSocket();

  useEffect(() => {
    // Subscribe to PHI telemetry events
    const handlePHIEvent = (event: { type: string; data: PHIState }) => {
      if (event.type === "phi.routing_decision" || event.type === "phi.mode_change") {
        setPHIState(event.data);
      }
    };

    subscribe(`phi.${sessionId}`, handlePHIEvent);

    return () => unsubscribe(`phi.${sessionId}`, handlePHIEvent);
  }, [sessionId, subscribe, unsubscribe]);

  return phiState;
}

Option 2: REST API Polling

// GET /api/voice/phi-state/{session_id}
// Returns current PHI routing state for the session

async function fetchPHIState(sessionId: string): Promise<PHIState> {
  const response = await fetch(`/api/voice/phi-state/${sessionId}`);
  return response.json();
}

// Example usage in a component
function PHIIndicator({ sessionId }: { sessionId: string }) {
  const [state, setState] = useState<PHIState | null>(null);

  useEffect(() => {
    const interval = setInterval(async () => {
      const newState = await fetchPHIState(sessionId);
      setState(newState);
    }, 1000); // Poll every second

    return () => clearInterval(interval);
  }, [sessionId]);

  if (!state) return null;

  return (
    <div className={`phi-indicator phi-${state.indicatorColor}`}>
      <span className="icon">{getIcon(state.indicatorIcon)}</span>
      <span className="tooltip">{state.tooltip}</span>
    </div>
  );
}

Backend API for Frontend State

# In your FastAPI router
from app.services.phi_stt_router import get_phi_stt_router

@router.get("/api/voice/phi-state/{session_id}")
async def get_phi_state(session_id: str):
    """Get current PHI routing state for frontend indicator."""
    router = get_phi_stt_router()
    state = router.get_frontend_state(session_id)

    if state is None:
        raise HTTPException(404, "Session not found")

    return state

Telemetry Event Types

Event Type	Description	Payload
`phi.routing_decision`	New routing decision made	Full PHI state + previous mode
`phi.mode_change`	PHI mode changed (e.g., cloud → local)	From/to modes, reason
`phi.phi_detected`	PHI entities detected in audio	Score, entity types
`phi.session_start`	New PHI session initialized	Initial state
`phi.session_end`	PHI session ended	Final mode, had PHI flag

Audit Logging

All PHI routing decisions are logged for compliance:

logger.info("PHI routing decision", extra={
    "session_id": session_id,
    "phi_score": 0.85,
    "routing_decision": "local",
    "detection_signals": ["medication_mention", "condition_name"],
    "provider": "local_whisper",
    "processing_time_ms": 234,
    "model": "whisper-large-v3"
})

Prometheus Metrics

# Routing distribution
stt_routing_total.labels(routing="local").inc()
stt_routing_total.labels(routing="cloud").inc()
stt_routing_total.labels(routing="hybrid").inc()

# PHI detection accuracy
phi_detection_score_histogram.observe(phi_score)

# Latency by routing type
stt_latency_ms.labels(routing="local").observe(234)

Testing

Unit Tests

@pytest.mark.asyncio
async def test_phi_routing_high_score():
    """High PHI score routes to local Whisper."""
    router = PHISTTRouter()

    # Mock audio with PHI content
    audio = generate_test_audio("I take metformin for my diabetes")

    result = await router.transcribe(audio)

    assert result.routing == "local"
    assert result.phi_score >= 0.7
    assert result.provider == "local_whisper"

@pytest.mark.asyncio
async def test_phi_routing_low_score():
    """Low PHI score routes to cloud."""
    router = PHISTTRouter()

    # Mock audio without PHI
    audio = generate_test_audio("What is the weather today?")

    result = await router.transcribe(audio)

    assert result.routing == "cloud"
    assert result.phi_score < 0.3

Integration Tests

# Run PHI routing tests
pytest tests/services/test_phi_stt_router.py -v

# Test with real audio samples
pytest tests/integration/test_phi_routing_e2e.py -v --audio-samples ./test_audio/

Best Practices

Default to local for medical context: If session involves health topics, bias toward local processing
Cache PHI decisions per session: Avoid re-evaluating the same session repeatedly
Monitor latency impact: Local Whisper adds ~200ms; account for this in latency budgets
Regular model updates: Update Whisper model quarterly for accuracy improvements
Audit trail: Maintain logs of all routing decisions for compliance audits

PHI-Aware STT Routing

PHI-Aware STT Routing

Overview

Thinker-Talker Pipeline Integration

Routing Priority Order

PHI Detection

Detection Signals

Sensitivity Scores

Routing Strategies

1. Cloud STT (Default)

2. Local Whisper (Secure)

3. Hybrid Mode (Redacted)

Configuration

Environment Variables

Feature Flag

Local Whisper Setup

Installation

Model Options

GPU Requirements

UI Integration

PHI Indicator Component

Visual States

Subscribing to PHI Routing Updates (Frontend)

Option 1: WebSocket Subscription

Option 2: REST API Polling

Backend API for Frontend State

Telemetry Event Types

Audit Logging

Prometheus Metrics

Testing

Unit Tests

Integration Tests

Best Practices

Related Documentation