VoiceAssist V2 Observability
Purpose: This document defines observability patterns for monitoring, logging, and alerting across all VoiceAssist services.
Last Updated: 2025-11-20
Overview
VoiceAssist V2 uses a three-pillar observability approach:
- Metrics - Prometheus for time-series metrics
- Logs - Structured logging with trace IDs
- Traces - Distributed tracing (optional in Phase 11-14)
Standard Service Endpoints
Every service must expose these endpoints:
Health Check (Liveness)
Endpoint: GET /health
Purpose: Kubernetes liveness probe - is the service process running?
Response:
{ "status": "healthy", "timestamp": "2025-11-20T12:34:56.789Z", "service": "kb-service", "version": "2.0.0" }
FastAPI Example:
from fastapi import APIRouter from datetime import datetime router = APIRouter(tags=["observability"]) @router.get("/health") async def health_check(): """Liveness probe - is service running?""" return { "status": "healthy", "timestamp": datetime.utcnow().isoformat(), "service": "kb-service", "version": "2.0.0", }
Readiness Check (Dependencies)
Endpoint: GET /ready
Purpose: Kubernetes readiness probe - are dependencies available?
Checks:
- Database connection (PostgreSQL)
- Redis connection
- Qdrant connection (if KB service)
- Nextcloud API (if applicable)
Response (Healthy):
{ "status": "ready", "timestamp": "2025-11-20T12:34:56.789Z", "dependencies": { "postgres": "healthy", "redis": "healthy", "qdrant": "healthy" } }
Response (Degraded):
{ "status": "degraded", "timestamp": "2025-11-20T12:34:56.789Z", "dependencies": { "postgres": "healthy", "redis": "unhealthy", "qdrant": "healthy" } }
FastAPI Example:
from fastapi import APIRouter, status from fastapi.responses import JSONResponse @router.get("/ready") async def readiness_check( db: Session = Depends(get_db), redis: Redis = Depends(get_redis), ): """Readiness probe - are dependencies healthy?""" dependencies = {} all_healthy = True # Check PostgreSQL try: await db.execute("SELECT 1") dependencies["postgres"] = "healthy" except Exception as e: dependencies["postgres"] = "unhealthy" all_healthy = False logger.error(f"PostgreSQL health check failed: {e}") # Check Redis try: await redis.ping() dependencies["redis"] = "healthy" except Exception as e: dependencies["redis"] = "unhealthy" all_healthy = False logger.error(f"Redis health check failed: {e}") # Check Qdrant (if KB service) if settings.SERVICE_NAME == "kb-service": try: await qdrant_client.health_check() dependencies["qdrant"] = "healthy" except Exception as e: dependencies["qdrant"] = "unhealthy" all_healthy = False logger.error(f"Qdrant health check failed: {e}") status_code = status.HTTP_200_OK if all_healthy else status.HTTP_503_SERVICE_UNAVAILABLE return JSONResponse( status_code=status_code, content={ "status": "ready" if all_healthy else "degraded", "timestamp": datetime.utcnow().isoformat(), "dependencies": dependencies, } )
Prometheus Metrics
Endpoint: GET /metrics
Purpose: Export metrics in Prometheus format
Response: Plain text Prometheus metrics
FastAPI Setup:
from prometheus_client import Counter, Histogram, Gauge, generate_latest, CONTENT_TYPE_LATEST from fastapi import Response # Define metrics chat_requests_total = Counter( 'chat_requests_total', 'Total chat requests', ['intent', 'phi_detected'] ) kb_search_duration_seconds = Histogram( 'kb_search_duration_seconds', 'KB search duration', buckets=[0.1, 0.5, 1.0, 2.0, 5.0, 10.0] ) tool_failure_total = Counter( 'tool_failure_total', 'External tool failures', ['tool', 'error_type'] ) phi_redacted_total = Counter( 'phi_redacted_total', 'PHI redaction events' ) indexing_jobs_active = Gauge( 'indexing_jobs_active', 'Currently running indexing jobs' ) @router.get("/metrics") async def metrics(): """Prometheus metrics endpoint.""" return Response( content=generate_latest(), media_type=CONTENT_TYPE_LATEST )
Key Metrics
Chat & Query Metrics
| Metric | Type | Labels | Purpose |
|---|---|---|---|
chat_requests_total | Counter | intent, phi_detected | Total chat requests by intent |
chat_duration_seconds | Histogram | intent | End-to-end chat latency |
streaming_messages_total | Counter | completed | Streaming message count |
phi_detected_total | Counter | - | PHI detection events |
phi_redacted_total | Counter | - | PHI redaction events |
Usage in Code:
async def process_chat(request: ChatRequest): phi_detected = await phi_detector.detect(request.message) # Increment counter chat_requests_total.labels( intent=request.intent, phi_detected=str(phi_detected.contains_phi) ).inc() # Time the request with chat_duration_seconds.labels(intent=request.intent).time(): response = await conductor.process_query(request) if phi_detected.contains_phi: phi_detected_total.inc() return response
KB & Search Metrics
| Metric | Type | Labels | Purpose |
|---|---|---|---|
kb_search_duration_seconds | Histogram | source_type | KB search latency |
kb_search_results_total | Histogram | - | Number of results returned |
kb_cache_hits_total | Counter | - | Redis cache hits |
kb_cache_misses_total | Counter | - | Redis cache misses |
embedding_generation_duration_seconds | Histogram | - | Embedding generation time |
Indexing Metrics
| Metric | Type | Labels | Purpose |
|---|---|---|---|
indexing_jobs_active | Gauge | - | Currently running jobs |
indexing_jobs_total | Counter | state | Total jobs by final state |
indexing_duration_seconds | Histogram | - | Time to index document |
chunks_created_total | Counter | source_type | Total chunks created |
Tool Invocation Metrics
VoiceAssist uses a comprehensive tools system (see TOOLS_AND_INTEGRATIONS.md) that requires detailed observability.
| Metric | Type | Labels | Purpose |
|---|---|---|---|
voiceassist_tool_calls_total | Counter | tool_name, status | Total tool calls by status (completed, failed, timeout, cancelled) |
voiceassist_tool_execution_duration_seconds | Histogram | tool_name | Tool execution duration (p50, p95, p99) |
voiceassist_tool_confirmation_required_total | Counter | tool_name, confirmed | Tool calls requiring user confirmation |
voiceassist_tool_phi_detected_total | Counter | tool_name | Tool calls with PHI detected |
voiceassist_tool_errors_total | Counter | tool_name, error_code | Tool execution errors by code |
voiceassist_tool_timeouts_total | Counter | tool_name | Tool execution timeouts |
voiceassist_tool_active_calls | Gauge | tool_name | Currently executing tool calls |
Status Label Values:
completed- Tool executed successfullyfailed- Tool execution failed with errortimeout- Tool execution exceeded timeoutcancelled- User cancelled tool execution
Common Error Codes:
VALIDATION_ERROR- Invalid argumentsPERMISSION_DENIED- User lacks permissionEXTERNAL_API_ERROR- External service failureTIMEOUT- Execution timeoutPHI_VIOLATION- PHI sent to non-PHI tool
Usage in Tool Execution:
# server/app/services/orchestration/tool_executor.py from prometheus_client import Counter, Histogram, Gauge from contextvars import ContextVar import time # Metrics tool_calls_total = Counter( 'voiceassist_tool_calls_total', 'Total tool invocations', ['tool_name', 'status'] ) tool_execution_duration = Histogram( 'voiceassist_tool_execution_duration_seconds', 'Tool execution duration', ['tool_name'], buckets=[0.1, 0.5, 1.0, 2.0, 5.0, 10.0, 30.0, 60.0] ) tool_confirmation_required = Counter( 'voiceassist_tool_confirmation_required_total', 'Tool calls requiring confirmation', ['tool_name', 'confirmed'] ) tool_phi_detected = Counter( 'voiceassist_tool_phi_detected_total', 'Tool calls with PHI detected', ['tool_name'] ) tool_errors = Counter( 'voiceassist_tool_errors_total', 'Tool execution errors', ['tool_name', 'error_code'] ) tool_timeouts = Counter( 'voiceassist_tool_timeouts_total', 'Tool execution timeouts', ['tool_name'] ) tool_active_calls = Gauge( 'voiceassist_tool_active_calls', 'Currently executing tool calls', ['tool_name'] ) async def execute_tool( tool_name: str, args: dict, user: UserContext, trace_id: str, ) -> ToolResult: """ Execute a tool with comprehensive metrics tracking. See: docs/TOOLS_AND_INTEGRATIONS.md See: docs/ORCHESTRATION_DESIGN.md#tool-execution-engine """ start_time = time.time() status = "failed" # Default to failed # Increment active calls tool_active_calls.labels(tool_name=tool_name).inc() try: # Get tool definition tool_def = TOOL_REGISTRY.get(tool_name) if not tool_def: tool_errors.labels(tool_name=tool_name, error_code="TOOL_NOT_FOUND").inc() raise ToolNotFoundError(f"Tool {tool_name} not found") # Check for PHI in arguments phi_result = await phi_detector.detect_in_dict(args) if phi_result.contains_phi: tool_phi_detected.labels(tool_name=tool_name).inc() # Ensure tool allows PHI if not tool_def.allows_phi: tool_errors.labels(tool_name=tool_name, error_code="PHI_VIOLATION").inc() raise ToolPHIViolationError( f"Tool {tool_name} cannot process PHI" ) # Check if confirmation required if tool_def.requires_confirmation: confirmed = await request_user_confirmation(tool_name, args, user, trace_id) tool_confirmation_required.labels( tool_name=tool_name, confirmed=str(confirmed).lower() ).inc() if not confirmed: status = "cancelled" return ToolResult( success=False, error_code="USER_CANCELLED", error_message="User cancelled tool execution" ) # Execute tool with timeout timeout_seconds = tool_def.timeout_seconds try: async with asyncio.timeout(timeout_seconds): result = await tool_def.execute(args, user, trace_id) status = "completed" return result except asyncio.TimeoutError: status = "timeout" tool_timeouts.labels(tool_name=tool_name).inc() raise ToolTimeoutError( f"Tool {tool_name} exceeded timeout ({timeout_seconds}s)" ) except ToolError as e: status = "failed" tool_errors.labels(tool_name=tool_name, error_code=e.error_code).inc() raise except Exception as e: status = "failed" tool_errors.labels(tool_name=tool_name, error_code="UNKNOWN_ERROR").inc() raise finally: # Record metrics duration = time.time() - start_time tool_execution_duration.labels(tool_name=tool_name).observe(duration) tool_calls_total.labels(tool_name=tool_name, status=status).inc() tool_active_calls.labels(tool_name=tool_name).dec() # Structured logging logger.info( "Tool execution completed", extra={ "tool_name": tool_name, "status": status, "duration_ms": int(duration * 1000), "phi_detected": phi_result.contains_phi if 'phi_result' in locals() else False, "trace_id": trace_id, "user_id": user.id, } )
External Tool Metrics (Legacy)
For backward compatibility, external API calls also emit these metrics:
| Metric | Type | Labels | Purpose |
|---|---|---|---|
tool_requests_total | Counter | tool | Total external API requests (legacy) |
tool_failure_total | Counter | tool, error_type | External tool failures (legacy) |
tool_duration_seconds | Histogram | tool | External tool latency (legacy) |
Note: New code should use voiceassist_tool_* metrics above. These legacy metrics are maintained for backward compatibility with Phase 5 implementations.
Logging Conventions
Log Structure
Every log line must include:
timestamp(ISO 8601 UTC)level(DEBUG, INFO, WARNING, ERROR, CRITICAL)service(service name)trace_id(from request)message(log message)session_id(if applicable)user_id(if applicable, never with PHI)
JSON Format:
{ "timestamp": "2025-11-20T12:34:56.789Z", "level": "INFO", "service": "kb-service", "trace_id": "550e8400-e29b-41d4-a716-446655440000", "session_id": "abc123", "user_id": "user_456", "message": "KB search completed", "duration_ms": 1234, "results_count": 5 }
Python Logging Setup
import logging import json from datetime import datetime from contextvars import ContextVar # Context var for trace_id trace_id_var: ContextVar[str] = ContextVar('trace_id', default='') class JSONFormatter(logging.Formatter): """Format logs as JSON.""" def format(self, record): log_data = { "timestamp": datetime.utcnow().isoformat(), "level": record.levelname, "service": settings.SERVICE_NAME, "trace_id": trace_id_var.get(), "message": record.getMessage(), } # Add extra fields if hasattr(record, 'session_id'): log_data['session_id'] = record.session_id if hasattr(record, 'user_id'): log_data['user_id'] = record.user_id if hasattr(record, 'duration_ms'): log_data['duration_ms'] = record.duration_ms # Add exception info if record.exc_info: log_data['exception'] = self.formatException(record.exc_info) return json.dumps(log_data) # Configure logger logger = logging.getLogger("voiceassist") handler = logging.StreamHandler() handler.setFormatter(JSONFormatter()) logger.addHandler(handler) logger.setLevel(logging.INFO)
PHI Logging Rules
CRITICAL: PHI must NEVER be logged directly.
Allowed:
- Session IDs (UUIDs)
- User IDs (UUIDs)
- Document IDs
- Trace IDs
- Intent types
- Error codes
- Counts and aggregates
FORBIDDEN:
- Patient names
- Patient dates of birth
- Medical record numbers
- Actual query text (if contains PHI)
- Clinical context details
- Document content
Instead of logging query text:
# Bad - may contain PHI logger.info(f"Processing query: {query}") # Good - log query hash or length logger.info( "Processing query", extra={ "query_length": len(query), "query_hash": sha256(query.encode()).hexdigest()[:8], "phi_detected": phi_result.contains_phi, } )
Alerting Rules
Critical Alerts (Page On-Call)
| Alert | Condition | Action |
|---|---|---|
| Service Down | Health check failing > 2 minutes | Page on-call engineer |
| Database Unavailable | PostgreSQL readiness check failing | Page DBA + engineer |
| High Error Rate | Error rate > 5% for 5 minutes | Page on-call engineer |
| PHI Leak Detected | PHI in logs or external API call | Page security team immediately |
Warning Alerts (Slack Notification)
| Alert | Condition | Action |
|---|---|---|
| High Latency | p95 latency > 5s for 10 minutes | Notify #engineering |
| KB Search Timeouts | > 10% timeout rate for 5 minutes | Notify #engineering |
| External Tool Failures | > 20% failure rate for 10 minutes | Notify #engineering |
| Indexing Job Failures | > 3 failed jobs in 1 hour | Notify #admin |
Example Prometheus Alert Rules
# alerts.yml groups: - name: voiceassist rules: - alert: HighChatLatency expr: histogram_quantile(0.95, chat_duration_seconds_bucket) > 5 for: 10m labels: severity: warning annotations: summary: "High chat latency detected" description: "95th percentile chat latency is {{ $value }}s" - alert: HighErrorRate expr: rate(chat_requests_total{status="error"}[5m]) > 0.05 for: 5m labels: severity: critical annotations: summary: "High error rate detected" description: "Error rate is {{ $value | humanizePercentage }}" - alert: ExternalToolFailures expr: rate(tool_failure_total[5m]) > 0.2 for: 10m labels: severity: warning annotations: summary: "High external tool failure rate" description: "Tool {{ $labels.tool }} failing at {{ $value | humanizePercentage }}"
Grafana Dashboards
Suggested Dashboards
-
System Overview
- Request rate (requests/sec)
- Error rate (%)
- Latency (p50, p95, p99)
- Active sessions
-
Chat Service
- Chat requests by intent
- Streaming vs non-streaming
- PHI detection rate
- Citations per response
-
Knowledge Base
- KB search latency
- Cache hit rate
- Indexing job status
- Document count by source type
-
External Tools
- Tool request rate
- Tool failure rate
- Tool latency by tool
- Cost tracking (API usage)
Distributed Tracing (Phase 11-14)
For microservices deployment, add distributed tracing:
Tools: Jaeger or OpenTelemetry
Trace Spans:
- Chat request (root span)
- PHI detection
- KB search
- External tool calls (parallel)
- LLM generation
- Safety filters
Benefits:
- Visualize request flow across services
- Identify bottlenecks
- Debug distributed failures
Related Documentation
- ARCHITECTURE_V2.md - System architecture
- SECURITY_COMPLIANCE.md - HIPAA logging requirements
- ADMIN_PANEL_SPECS.md - Admin metrics dashboard
- server/README.md - API implementation
Summary
- All services expose
/health,/ready,/metrics - Metrics use Prometheus format
- Logs use structured JSON with trace IDs
- PHI must NEVER be logged
- Critical alerts page on-call
- Grafana dashboards for monitoring