VoiceAssist Architecture V2 - Enterprise Microservices
System Overview
VoiceAssist V2 is an enterprise-grade, HIPAA-compliant, multi-user medical AI assistant designed to support hundreds of concurrent users with high availability, security, and performance.
Architecture Diagram
Key Architectural Decision: Nextcloud is a separate stack, not part of VoiceAssist deployment.
┌─────────────────────────────────────────────────────────────────┐
│ Users (Web/Mobile) │
│ Browser / Mobile Apps │
└────────────────┬────────────────────┬────────────────────────────┘
│ │
┌──────┴──────┐ ┌──────┴──────┐
│ │ │ │
v │ v │
┌───────────────────┐ │ ┌──────────────────────────────────────┐
│ Nextcloud Stack │ │ │ VoiceAssist Microservices Stack │
│ (Separate) │ │ │ (This Repository) │
│ │ │ │ │
│ - Identity/SSO │◄──┼──│ All VoiceAssist services integrate │
│ - File Storage │ │ │ with Nextcloud via HTTP APIs │
│ - Calendar │ │ │ │
│ - Email │ │ │ Environment variables: │
│ - User Directory │ │ │ NEXTCLOUD_BASE_URL │
│ │ │ │ NEXTCLOUD_OIDC_ISSUER │
│ Local Dev: │ │ │ NEXTCLOUD_CLIENT_ID │
│ ~/Nextcloud-Dev/ │ │ │ NEXTCLOUD_CLIENT_SECRET │
│ │ │ │ │
│ Production: │ │ │ │
│ cloud.asimo.io │ │ │ │
└───────────────────┘ │ └──────────────────────────────────────┘
│
│ HTTPS / OIDC / WebDAV APIs
│
v
┌──────────────────────────────────────────────────────────────────┐
│ VoiceAssist Microservices (Docker Compose) │
│ │
│ ┌────────────────┐ ┌────────────────┐ ┌──────────────────┐ │
│ │ API Gateway │ │ Auth Service │ │ Voice Proxy │ │
│ │ (Kong) │ │ (JWT/OIDC) │ │ (WebRTC) │ │
│ └────────────────┘ └────────────────┘ └──────────────────┘ │
│ │
│ ┌────────────────┐ ┌────────────────┐ ┌──────────────────┐ │
│ │ Medical KB │ │ Admin API │ │ File Indexer │ │
│ │ (RAG/AI) │ │ (Management) │ │ (Nextcloud) │ │
│ └────────────────┘ └────────────────┘ └──────────────────┘ │
│ │
│ ┌────────────────┐ ┌────────────────┐ ┌──────────────────┐ │
│ │ Calendar/ │ │ Guideline │ │ Medical │ │
│ │ Email Service │ │ Scraper │ │ Calculator │ │
│ └────────────────┘ └────────────────┘ └──────────────────┘ │
└───────────────────────────┬───────────────────────────────────┘
│
┌───────────────────────────┴───────────────────────────────────┐
│ Data Layer (Docker Compose) │
│ │
│ ┌──────────────────┐ ┌──────────────┐ ┌─────────────────┐ │
│ │ PostgreSQL │ │ Redis │ │ Qdrant │ │
│ │ (pgvector) │ │ (Cache) │ │ (Vectors) │ │
│ └──────────────────┘ └──────────────┘ └─────────────────┘ │
└─────────────────────────────────────────────────────────────────┘
│
┌───────────────────────────┴───────────────────────────────────┐
│ Observability Stack (Docker Compose) │
│ ┌────────────┬────────────┬────────────┬───────────────┐ │
│ │ Prometheus │ Grafana │ Jaeger │ Loki (Logs) │ │
│ └────────────┴────────────┴────────────┴───────────────┘ │
└─────────────────────────────────────────────────────────────────┘
Key Architecture Principles
1. Nextcloud Separation
- Nextcloud runs as a completely separate stack
- Local Development:
~/Nextcloud-Dev/docker-compose.yml - Production: Separate deployment (existing server or dedicated cluster)
- VoiceAssist integrates via standard APIs (OIDC, WebDAV, CalDAV, CardDAV)
2. Integration Pattern
- VoiceAssist services are clients of Nextcloud
- Communication via HTTP/HTTPS APIs
- No shared Docker Compose project
- No shared databases between stacks
- Environment variables configure the connection
3. Deployment Independence
- Nextcloud can be updated/restarted without affecting VoiceAssist
- VoiceAssist can be updated/restarted without affecting Nextcloud
- Separate monitoring and logging (though can be aggregated)
Authentication Flow:
User → Browser → Nextcloud Login (OIDC) → JWT Token → VoiceAssist Services
Local Development Architecture
MacBook Pro
├── ~/Nextcloud-Dev/ # Separate Nextcloud Stack
│ ├── docker-compose.yml # Nextcloud + DB
│ ├── data/ # Nextcloud files
│ ├── db/ # Nextcloud DB
│ └── config/ # Nextcloud config
│
│ Running at: http://localhost:8080
│
└── ~/VoiceAssist/ # VoiceAssist Stack
├── docker-compose.yml # All VoiceAssist services
├── services/ # Microservices code
├── data/ # VoiceAssist data
└── .env # Includes NEXTCLOUD_BASE_URL=http://localhost:8080
Running at: http://localhost:8000 (API Gateway)
┌──────────────────────────┴──────────────────────────────────────┐ │ API Gateway (Kong/Nginx) │ │ Rate Limiting │ Auth │ Routing │ Logging │ │ │ │ NOTE: Phases 0-10 - No separate gateway, FastAPI handles all │ │ Phases 11-14 - Extract to Kong/Nginx for microservices │ └──────────────────────────┬──────────────────────────────────────┘ │ ┌──────────────────────────┴──────────────────────────────────────┐ │ Service Mesh (Linkerd/Istio) │ │ mTLS │ Service Discovery │ Load Balancing │ Policies │ │ │ │ NOTE: Phases 0-10 - Not needed (single app) │ │ Phases 11-14 - Add for microservices security │ │ │ │ ┌─────────────────┬─────────────────┬─────────────────────┐ │ │ │ Voice Proxy │ Medical KB │ Admin API │ │ │ │ Service │ Service │ Service │ │ │ │ - WebRTC/WS │ - RAG Engine │ - Config Mgmt │ │ │ │ - OpenAI API │ - Orchestrator │ - User Mgmt │ │ │ │ - VAD/AEC │ - Embeddings │ - Analytics │ │ │ │ - Context │ - PubMed │ - RBAC │ │ │ └─────────────────┴─────────────────┴─────────────────────┘ │ │ │ │ Phases 0-10: Logical services (modules/routers in services/api-gateway/) │ │ Phases 11-14: Physical services (separate containers) │ │ │ │ ┌─────────────────┬─────────────────┬─────────────────────┐ │ │ │ Auth Service │ File Indexer │ Calendar/Email │ │ │ │ - JWT │ - Local Files │ Service │ │ │ │ - MFA │ - Nextcloud │ - CalDAV │ │ │ │ - RBAC │ - Auto-index │ - IMAP/SMTP │ │ │ └─────────────────┴─────────────────┴─────────────────────┘ │ │ │ │ ┌─────────────────┬─────────────────┬─────────────────────┐ │ │ │ Guideline │ Medical Calc │ PHI Detection │ │ │ │ Scraper │ Service │ Service │ │ │ │ - CDC/WHO │ - Wells/GRACE │ - Redaction │ │ │ │ - Auto-update │ - Renal Dosing │ - Classification │ │ │ └─────────────────┴─────────────────┴─────────────────────┘ │ └───────────────────────────┬───────────────────────────────────┘ │ ┌───────────────────────────┴───────────────────────────────────┐ │ Data Layer (Kubernetes) │ │ │ │ ┌──────────────────────────────────────────────────────────┐ │ │ │ PostgreSQL Cluster (Primary + Replicas) │ │ │ │ - Users, Conversations, Documents │ │ │ │ - pgvector extension │ │ │ │ - Encrypted at rest │ │ │ └──────────────────────────────────────────────────────────┘ │ │ │ │ ┌──────────────────────────────────────────────────────────┐ │ │ │ Redis Cluster (Master-Slave) │ │ │ │ - Sessions, Caching │ │ │ │ - Pub/Sub for real-time │ │ │ └──────────────────────────────────────────────────────────┘ │ │ │ │ ┌──────────────────────────────────────────────────────────┐ │ │ │ Qdrant (Vector Database) │ │ │ │ - Medical knowledge embeddings │ │ │ │ - Replicated for HA │ │ │ └──────────────────────────────────────────────────────────┘ │ └─────────────────────────────────────────────────────────────────┘ │ ┌───────────────────────────┴───────────────────────────────────┐ │ Observability Stack │ │ ┌───────────┬───────────┬───────────┬──────────────────┐ │ │ │Prometheus │ Grafana │ Jaeger │ Loki (Logs) │ │ │ │(Metrics) │(Dashboard)│ (Traces) │ AlertManager │ │ │ └───────────┴───────────┴───────────┴──────────────────┘ │ └─────────────────────────────────────────────────────────────────┘
## Key Architectural Decisions
### 1. Microservices Architecture
**Why:** Scalability, fault isolation, independent deployment, technology flexibility
## Backend Implementation Strategy
The backend uses a **monorepo-first, microservices-ready** architecture:
### Phases 0-10: Monorepo (Docker Compose Development)
- All services live in `services/api-gateway/` directory (canonical backend)
- Single FastAPI application with multiple routers
- Services are **logical boundaries** enforced through module structure
- Runs in single container for rapid development
- Suitable for < 50 concurrent users
- See [BACKEND_ARCHITECTURE.md](BACKEND_ARCHITECTURE.md) for complete structure
- Note: `server/` directory is **DEPRECATED** - kept for reference only
**Why Start with Monorepo?**
- Faster development iteration (single codebase)
- Simpler debugging (all code in one place)
- Lower operational complexity (no K8s, no service mesh)
- Easier testing (integration tests within single app)
- Shared dependencies and models
### Phases 11-14: Microservices (Kubernetes Migration)
- Services can be extracted to separate containers
- Each service becomes independent deployment
- Communication via HTTP/gRPC through service mesh
- Only split services that need independent scaling
- Suitable for > 50 concurrent users, high availability requirements
**When to Split:**
- Deployment to Kubernetes cluster
- Need for independent scaling (e.g., voice service needs more resources)
- Team growth (> 5 developers, need ownership boundaries)
- Different deployment cycles (e.g., ML model updates vs API changes)
### Service Catalog
**Logical Services** (see [SERVICE_CATALOG.md](SERVICE_CATALOG.md) for complete documentation):
- **API Gateway** - Entry point, routing, rate limiting (Phase 11-14 only)
- **Voice Proxy** - WebRTC/WebSocket handling, OpenAI Realtime API integration
- **Medical KB / RAG Service** - RAG engine, orchestrator, semantic search, embeddings
- **Admin API** - System management, config, analytics, real-time events via Redis pub/sub
- **Auth Service** - Authentication, JWT, MFA, RBAC
- **File Indexer / Ingestion** - Document processing, PDF/DOCX parsing, chunking
- **Calendar/Email Service** - CalDAV, IMAP/SMTP integration
- **Guideline Scraper** - Automated scraping of CDC/WHO guidelines
- **Medical Calculator** - Clinical calculators and scoring
- **PHI Detection** - Classify and redact PHI from logs/data
- **External APIs Service** - PubMed, UpToDate, Nextcloud integrations
- **Search Service** - Vector search, hybrid search (dense + sparse)
**Note:** In Phases 0-10, these services are modules/routers within the single FastAPI app. In Phases 11-14, they can be extracted to separate containers if needed.
**Medical KB / RAG Enhancements (Phase 12 hardening):**
- Model adapter registry exposes BioGPT and PubMedBERT behind feature toggles (`ENABLE_BIOGPT_ADAPTER`, `ENABLE_PUBMEDBERT_ADAPTER`) while keeping the default OpenAI model visible in API metadata.
- Query orchestrator now performs query decomposition and multi-hop retrieval when enabled (`ENABLE_QUERY_DECOMPOSITION`, `ENABLE_MULTI_HOP_RETRIEVAL`), synthesizing context across documents before calling the LLM.
- Responses surface model provenance, selection confidence, retrieval confidence, and reasoning path data so downstream clients can render transparency badges without additional lookups.
### 2. Kubernetes Orchestration
**Local Development:** K3s (lightweight K8s)
**Production:** Full Kubernetes cluster
**Benefits:**
- Container orchestration
- Auto-scaling (HorizontalPodAutoscaler)
- Self-healing (pod restarts)
- Rolling updates
- Resource management
- Declarative infrastructure
**Key K8s Resources:**
- **Deployments** - For stateless services
- **StatefulSets** - For databases
- **Services** - Internal service discovery
- **Ingress** - External routing
- **ConfigMaps/Secrets** - Configuration management
- **PersistentVolumeClaims** - Data storage
- **NetworkPolicies** - Network segmentation
### 3. Service Mesh (Linkerd/Istio)
**Purpose:** Service-to-service security, observability, traffic management
**Features:**
- **mTLS** - Automatic mutual TLS between all services
- **Service Discovery** - Dynamic service location
- **Load Balancing** - Intelligent request distribution
- **Circuit Breaking** - Prevent cascade failures
- **Retry Logic** - Automatic retries with backoff
- **Traffic Splitting** - Canary deployments, A/B testing
- **Distributed Tracing** - Request flow visualization
- **Metrics Collection** - Automatic Prometheus metrics
**Choice:** **Linkerd** recommended for simplicity and performance
### 4. Nextcloud Integration (Separate Stack)
**Why:** Unified user management, SSO, file storage, calendar, email
**Architecture Decision:** Nextcloud is a **separate deployment**, not part of VoiceAssist.
**Integration Points:**
- **User Directory** - Single source of truth for users (via API)
- **SSO** - OAuth2/OIDC authentication
- **File Storage** - Access via WebDAV API
- **Calendar** - CalDAV integration
- **Email** - Access via IMAP/SMTP or Nextcloud Mail API
- **Apps** - VoiceAssist web clients can be Nextcloud apps (optional)
**Local Development:**
~/Nextcloud-Dev/ ~/VoiceAssist/ ├── docker-compose.yml ├── docker-compose.yml │ └── nextcloud service │ └── voiceassist services │ └── postgres (for Nextcloud) │ └── postgres (for VoiceAssist) │ │ │ Port: 8080 │ Port: 8000 (API Gateway) │ │ │ VoiceAssist connects via: │ │ NEXTCLOUD_BASE_URL=http://localhost:8080
**Production:**
Nextcloud (Separate Server/Cluster) VoiceAssist (This System)
- cloud.asimo.io - voiceassist.asimo.io
- Managed independently - Connects via HTTPS
- Can be existing NC installation - Environment: NEXTCLOUD_BASE_URL=https://cloud.asimo.io
**Authentication Flow:**
- User → https://voiceassist.asimo.io
- Redirect → Nextcloud OIDC (cloud.asimo.io/apps/oidc)
- User logs in to Nextcloud (MFA if enabled)
- Nextcloud returns authorization code
- VoiceAssist exchanges code for JWT token
- VoiceAssist validates token and creates session
- User accesses VoiceAssist with valid session
**Integration Method:**
- **NOT** via shared Docker Compose project
- **NOT** via shared database
- **YES** via HTTP APIs (OIDC, WebDAV, CalDAV, etc.)
- **YES** via environment variable configuration
**Environment Variables Required:**
```bash
# Nextcloud Connection
NEXTCLOUD_BASE_URL=http://localhost:8080 # or https://cloud.asimo.io
NEXTCLOUD_OIDC_ISSUER=${NEXTCLOUD_BASE_URL}/apps/oidc
NEXTCLOUD_CLIENT_ID=voiceassist
NEXTCLOUD_CLIENT_SECRET=secret_from_nextcloud
NEXTCLOUD_REDIRECT_URI=http://localhost:8000/auth/callback
# Nextcloud APIs
NEXTCLOUD_WEBDAV_URL=${NEXTCLOUD_BASE_URL}/remote.php/dav
NEXTCLOUD_CALDAV_URL=${NEXTCLOUD_BASE_URL}/remote.php/dav/calendars
NEXTCLOUD_CARDDAV_URL=${NEXTCLOUD_BASE_URL}/remote.php/dav/addressbooks
# Admin credentials (for service account operations)
NEXTCLOUD_ADMIN_USER=admin
NEXTCLOUD_ADMIN_PASSWORD=secure_password
5. Admin Panel Integration
Purpose: Centralized management, monitoring, and observability for platform administrators.
URLs:
- Admin Panel:
https://admin.asimo.io(React 18 + Vite) - Web App:
https://dev.asimo.io(React 18 + Vite) - Docs Site:
https://assistdocs.asimo.io(Next.js 14 static export)
Architecture:
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ Admin Panel │ │ Web App │ │ Docs Site │
│ admin.asimo.io │────▶│ dev.asimo.io │────▶│assistdocs.asimo.io│
└────────┬────────┘ └────────┬────────┘ └─────────────────┘
│ │
└───────────┬───────────┘
▼
┌─────────────────────┐
│ API Gateway │
│ /api/admin/panel/* │
└──────────┬──────────┘
│
┌──────────┴──────────┐
│ │
┌────▼────┐ ┌─────▼────┐
│ Database │ │ Redis │
│PostgreSQL│ │ Pub/Sub │
└──────────┘ └──────────┘
Key Features:
- Cross-App Navigation: Unified navigation between admin, web app, and docs
- Conversations Management: View all user conversations with message history
- Clinical Contexts: HIPAA-compliant PHI access with audit logging
- Voice Monitor: Real-time visibility into voice sessions and TT pipeline
- Real-Time Events: WebSocket with Redis pub/sub for live updates
Admin Panel Endpoints:
| Method | Endpoint | Description |
|---|---|---|
| GET | /api/admin/panel/conversations | List all conversations |
| GET | /api/admin/panel/clinical-contexts | List clinical contexts |
| POST | /api/admin/panel/clinical-contexts/{id}/reveal | Reveal PHI (audited) |
| GET | /api/admin/panel/voice/sessions | List voice sessions |
| GET | /api/admin/panel/voice/tt-sessions | TT pipeline state |
| GET | /api/admin/panel/voice/tt-analytics | TT performance metrics |
| WS | /api/admin/panel/ws | Real-time event stream |
Real-Time Event Types:
session.connected/session.disconnected- User sessionsconversation.created/conversation.updated- Conversationsvoice.session_started/voice.session_ended- Voice modett.state_changed/tt.tool_called- TT pipelinephi.accessed- PHI audit events (immediate)system.alert- System notifications (immediate)
Event Publishing:
from app.services.admin_event_publisher import ( publish_voice_session_started, publish_tt_state_changed, publish_phi_accessed, ) # Events are buffered and batched via Redis await publish_voice_session_started(user_id, session_id, "realtime", "alloy") # High-priority events bypass buffer await publish_phi_accessed(admin_id, admin_email, context_id, target_user_id)
Frontend Integration:
import { useRealtimeEvents } from "@/hooks/useRealtimeEvents"; const { status, events, metrics } = useRealtimeEvents({ autoConnect: true, eventFilter: ["voice.session_started", "tt.state_changed"], onEvent: (event) => console.log(event), });
Environment Variables:
# Admin Panel ADMIN_PANEL_ENABLED=true ADMIN_PANEL_CORS_ORIGINS=https://admin.asimo.io # Redis for real-time events REDIS_URL=redis://localhost:6379 ADMIN_EVENTS_CHANNEL=admin:events # Cross-app URLs VITE_WEB_APP_URL=https://dev.asimo.io VITE_DOCS_URL=https://assistdocs.asimo.io
Documentation:
6. Zero-Trust Security Model
Principles:
- Never trust, always verify
- Assume breach
- Verify explicitly
- Use least privilege
- Segment access
Implementation:
- mTLS for all inter-service communication
- Short-lived JWT tokens (5-15 minutes)
- Token refresh mechanism
- Network policies to restrict traffic
- Open Policy Agent for fine-grained authorization
- PHI detection and automatic redaction
- Audit logging for all access
6. HIPAA Compliance
Key Requirements:
- Encryption in transit - TLS 1.2+, mTLS
- Encryption at rest - Database encryption, encrypted backups
- Access controls - RBAC, MFA, audit logs
- PHI protection - Never log PHI, redact when necessary
- Backup and recovery - Encrypted backups, tested recovery
- Audit trail - Log all access to ePHI
- Risk analysis - Periodic security assessments
7. High Availability Design
Database Layer:
- PostgreSQL primary with streaming replication
- Read replicas for load distribution
- Automatic failover
Application Layer:
- Multiple replicas per service (min 2 in production)
- Load balancing across replicas
- Health checks and auto-restart
Network Layer:
- Multiple availability zones (if cloud)
- Load balancer with health checks
- DNS failover
Target SLAs:
- Availability: 99.9% (8.76 hours downtime/year)
- Latency: <500ms for voice activation, <2s for chat
- Throughput: Support 500+ concurrent users
Data Flow Examples
Example 1: Voice Interaction with Dynamic Clarification
1. User clicks "Connect" in web app (Nextcloud-hosted)
2. Web app authenticates with Nextcloud → gets JWT token
3. Web app establishes WebRTC connection to Voice Proxy service
4. Voice Proxy validates JWT with Auth Service
5. User speaks: "Search UpToDate for kidney disease management"
6. Voice Proxy:
- Detects end of speech (VAD)
- Sends audio to OpenAI Realtime API
- Transcription returned
7. Medical KB Service analyzes query:
- Detects ambiguity ("kidney disease" is broad)
- Generates clarification: "Do you mean acute kidney injury,
chronic kidney disease, diabetic kidney disease, or a specific
type like pre-renal, intrinsic, or post-renal?"
8. Voice Proxy speaks clarification back to user
9. User responds: "Chronic kidney disease"
10. Medical KB Service:
- Queries UpToDate API with refined query
- Retrieves relevant articles
- Generates structured summary with RAG
11. Voice Proxy speaks summary with citations
12. User: "Download the guideline PDF"
13. File Indexer Service:
- Downloads PDF
- Stores in Nextcloud
- Indexes for future queries
14. Voice Proxy: "Downloaded and saved to your Nextcloud files"
Example 2: Calendar Integration
1. User (via voice): "Add a meeting tomorrow at 2pm with Dr. Smith"
2. Voice Proxy → Medical KB Service → analyzes intent
3. Calendar/Email Service:
- Creates event in Nextcloud calendar (CalDAV)
- Syncs to linked Google Calendar (if configured)
- Sends email invite via Nextcloud Mail
4. Voice Proxy confirms: "Meeting created for tomorrow at 2pm and
invite sent to Dr. Smith"
Example 3: File Indexing
1. User uploads medical note PDF to Nextcloud
2. Nextcloud triggers webhook → File Indexer Service
3. File Indexer:
- Downloads file via WebDAV
- Extracts text (PyPDF2, pdfplumber, Tesseract)
- Chunks intelligently
- Classifies for PHI (PHI Detection Service)
- Generates embeddings (local if PHI, cloud if not)
- Stores in Qdrant with metadata
4. File becomes searchable in voice queries
Security Architecture
Network Segmentation
┌─────────────────────────────────────────────────────────┐
│ Public Internet │
└───────────────────────┬─────────────────────────────────┘
│
Firewall (UFW)
Allow: 80, 443
│
┌───────────────────────┴─────────────────────────────────┐
│ DMZ (Ingress) │
│ Nextcloud + API Gateway │
└───────────────────────┬─────────────────────────────────┘
│
K8s Network Policies
│
┌───────────────────────┴─────────────────────────────────┐
│ Application Services (Mesh) │
│ mTLS enforced, no service-to-service direct access │
└───────────────────────┬─────────────────────────────────┘
│
Network Policies
│
┌───────────────────────┴─────────────────────────────────┐
│ Data Layer │
│ No external access, only from app services │
└─────────────────────────────────────────────────────────┘
Authentication Flow
┌──────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐
│ User │────────>│Nextcloud │─────>│ Keycloak │────>│ JWT │
└──────┘ Login └──────────┘ OIDC └──────────┘ └────┬─────┘
│
v
┌──────────────────────────┐
│ VoiceAssist Services │
│ Validate JWT + Check │
│ Permissions (OPA) │
└──────────────────────────┘
Authorization with Open Policy Agent
# Example OPA policy package voiceassist.authz default allow = false # Admins can do anything allow { input.user.role == "admin" } # Users can access their own data allow { input.user.role == "user" input.resource.user_id == input.user.id } # Medical staff can access medical knowledge base allow { input.user.role == "medical_staff" input.action == "query_medical_kb" }
Observability
See OBSERVABILITY.md for complete observability patterns, logging conventions, and metrics definitions.
Metrics (Prometheus)
System Metrics:
- CPU, memory, disk, network per pod
- Request rate, latency, error rate per service
- Database connections, query latency
- Cache hit rate
Business Metrics:
- Active users
- Voice sessions
- Queries per minute
- Medical KB lookups
- API costs (OpenAI, UpToDate)
Distributed Tracing (Jaeger)
Trace Example: Voice Query
Span 1: Voice Proxy receives audio (10ms)
Span 2: OpenAI Realtime API call (300ms)
Span 3: Medical KB query (150ms)
Span 4: Qdrant vector search (50ms)
Span 5: OpenAI GPT-4 summarization (100ms)
Span 6: Response synthesis (20ms)
Total: 480ms
Logging (Loki)
Log Levels:
- DEBUG - Development only
- INFO - Normal operations
- WARN - Potential issues
- ERROR - Errors requiring attention
- CRITICAL - Service failures
Log Redaction:
# Automatically redact PHI log.info(f"Processing request for user {redact_phi(user_id)}") # Output: "Processing request for user ****"
Alerting (AlertManager)
Alert Rules:
- Service down > 1 minute
- Error rate > 5%
- Voice latency > 1 second (p95)
- Database connections > 80% of max
- Disk usage > 80%
- SSL certificate expiring < 30 days
Scaling Strategy
Horizontal Scaling
# HorizontalPodAutoscaler example apiVersion: autoscaling/v2 kind: HorizontalPodAutoscaler metadata: name: voice-proxy-hpa spec: scaleTargetRef: apiVersion: apps/v1 kind: Deployment name: voice-proxy minReplicas: 2 maxReplicas: 10 metrics: - type: Resource resource: name: cpu target: type: Utilization averageUtilization: 70 - type: Resource resource: name: memory target: type: Utilization averageUtilization: 80
Database Scaling
- Read replicas for read-heavy operations
- Connection pooling (PgBouncer)
- Query optimization and indexing
- Caching with Redis
Vertical Scaling
- Increase pod resource limits
- Use larger node instance types
- Add GPU nodes for AI workloads (if needed)
Technology Stack Summary
| Layer | Technology | Purpose |
|---|---|---|
| Orchestration | Kubernetes (K3s locally) | Container orchestration |
| Service Mesh | Linkerd | mTLS, observability, traffic management |
| Identity | Nextcloud + Keycloak | SSO, user management |
| API Gateway | Kong or Nginx | Routing, rate limiting, auth |
| Backend | Python FastAPI | Microservices |
| Frontend | React 18 + Vite (web-app, admin-panel), Next.js 14 (docs-site) | Web apps |
| Databases | PostgreSQL (pgvector), Redis, Qdrant | Data persistence, caching, vectors |
| AI/ML | OpenAI, BioGPT, PubMedBERT | LLM, medical models |
| Voice | Thinker-Talker (Deepgram + ElevenLabs), OpenAI Realtime (legacy) | Voice interaction |
| Observability | Prometheus, Grafana, Jaeger, Loki | Monitoring, metrics, tracing, logging |
| IaC | Terraform, Ansible | Infrastructure automation |
| CI/CD | GitHub Actions | Automated testing and deployment |
| Security | Let's Encrypt, OPA, mTLS | SSL, authorization, encryption |
Deployment Architecture
Local Development (MacBook Pro)
K3s Cluster (local)
├── Nextcloud (localhost:9000)
├── API Gateway (localhost:8080)
├── Voice Proxy (localhost:8001)
├── Medical KB (localhost:8002)
├── Admin API (localhost:8003)
├── Auth Service (localhost:8004)
├── PostgreSQL (localhost:5432)
├── Redis (localhost:6379)
├── Qdrant (localhost:6333)
├── Prometheus (localhost:9090)
└── Grafana (localhost:3000)
Production (Ubuntu Server)
Kubernetes Cluster
├── Ingress (voiceassist.asimo.io)
│ └── SSL Termination
├── Service Mesh (Linkerd)
├── Microservices (2-10 replicas each)
│ ├── Nextcloud
│ ├── API Gateway
│ ├── Voice Proxy
│ ├── Medical KB
│ ├── Admin API
│ └── ... (all services)
├── Data Layer
│ ├── PostgreSQL Primary + 2 Replicas
│ ├── Redis Cluster (3 masters, 3 slaves)
│ └── Qdrant (3 replicas)
└── Observability Stack
├── Prometheus
├── Grafana
├── Jaeger
└── Loki
Next Steps
- Read
SECURITY_COMPLIANCE.mdfor HIPAA details - Read
NEXTCLOUD_INTEGRATION.mdfor Nextcloud specifics - Review
DEVELOPMENT_PHASES_V2.mdfor implementation plan - Start with Phase 0: Project Initialization