Docs / Raw

Unified Architecture Documentation

Sourced from docs/UNIFIED_ARCHITECTURE.md

Edit on GitHub

VoiceAssist V2 - Unified Architecture Documentation

Last Updated: 2025-12-02 (All 16 Phases Complete) Status: Canonical Reference Purpose: Comprehensive system architecture covering all components, data flows, and integration points


Table of Contents

  1. Executive Summary
  2. System Overview
  3. Architecture Principles
  4. Current Implementation Status
  5. Component Architecture
  6. Data Architecture
  7. Integration Architecture
  8. Security Architecture
  9. Deployment Architecture
  10. Observability Architecture
  11. Data Flow Examples
  12. Technology Stack
  13. Architecture Evolution
  14. Design Decisions and Trade-offs

Executive Summary

VoiceAssist V2 is an enterprise-grade, HIPAA-compliant medical AI assistant designed to support clinical decision-making through voice and text interfaces. The system has completed all 16 phases (0-15) with progressive architecture:

  • Phases 0-10: Monorepo-first backend with Docker Compose orchestration
  • Phases 11-14: Security hardening, HA/DR, testing, production deployment
  • Phase 15: Final review and handoff

Current Capabilities (all phases complete):

  • ✅ JWT-based authentication with token revocation
  • ✅ Role-based access control (RBAC) for admin operations
  • ✅ RAG-powered medical knowledge base with semantic search
  • ✅ Real-time WebSocket communication for streaming responses
  • ✅ Nextcloud integration (CalDAV, WebDAV, file auto-indexing)
  • ✅ Multi-level caching (L1 in-memory + L2 Redis)
  • ✅ Comprehensive observability (Prometheus metrics, structured logging, SLOs)
  • ✅ Admin panel with system monitoring dashboard
  • ✅ Async background job processing for document indexing

Design Philosophy: Start simple (monorepo), maintain clear boundaries (logical services), scale when needed (microservices extraction).


System Overview

High-Level Architecture

┌─────────────────────────────────────────────────────────────────┐
│                        Users (Web/Mobile)                        │
│                  Browser / Mobile Apps / Web UI                  │
└────────────────┬────────────────────┬────────────────────────────┘
                 │                    │
          ┌──────┴──────┐      ┌──────┴──────┐
          │             │      │             │
          v             │      v             │
┌───────────────────┐   │  ┌──────────────────────────────────────┐
│  Nextcloud Stack  │   │  │    VoiceAssist Backend Stack         │
│  (Separate)       │   │  │    (This Repository)                 │
│                   │   │  │                                      │
│  - Identity/SSO   │◄──┼──│  API Gateway (FastAPI)               │
│  - File Storage   │   │  │  Port: 8000                          │
│  - Calendar       │   │  │                                      │
│  - Email          │   │  │  Logical Services (Phases 0-7):      │
│  - User Directory │   │  │  - Auth Service (JWT + RBAC)         │
│                   │   │  │  - Realtime Service (WebSocket)      │
│  Local Dev:       │   │  │  - RAG Service (QueryOrchestrator)   │
│  Port 8080        │   │  │  - Admin Service (Dashboard + Mgmt)  │
│                   │   │  │  - KB Indexer (Document Ingestion)   │
│  Production:      │   │  │  - Integration Service (CalDAV/File) │
│  cloud.asimo.io   │   │  │  - Cache Service (L1+L2)             │
└───────────────────┘   │  │  - Audit Service (Compliance)        │
                        │  │                                      │
                        │  │  Background Workers (ARQ):           │
                        │  │  - Document Indexing Jobs            │
                        │  │  - File Auto-Indexing                │
                        │  └──────────────────────────────────────┘
                        │
                        │  HTTPS / OIDC / WebDAV / CalDAV APIs
                        │
                        v
┌──────────────────────────────────────────────────────────────────┐
│                      Data Layer (Docker Compose)                 │
│                                                                   │
│  ┌──────────────────┐  ┌──────────────┐  ┌─────────────────┐   │
│  │  PostgreSQL      │  │  Redis       │  │  Qdrant         │   │
│  │  (pgvector)      │  │  (6 DBs)     │  │  (Vectors)      │   │
│  │                  │  │              │  │                 │   │
│  │  Tables:         │  │  DB 0: Cache │  │  Collection:    │   │
│  │  - users         │  │  DB 1: Queue │  │  - medical_kb   │   │
│  │  - sessions      │  │  DB 2: L2    │  │                 │   │
│  │  - messages      │  │  DB 3: Token │  │  Embedding:     │   │
│  │  - documents     │  │  DB 15: Test │  │  - 1536 dims    │   │
│  │  - audit_logs    │  │              │  │  - Cosine sim   │   │
│  └──────────────────┘  └──────────────┘  └─────────────────┘   │
└──────────────────────────────────────────────────────────────────┘
                            │
┌───────────────────────────┴───────────────────────────────────┐
│                 Observability Stack (Docker Compose)           │
│  ┌────────────┬────────────┬────────────┬───────────────┐    │
│  │ Prometheus │  Grafana   │  (Jaeger)  │  Loki (Logs)  │    │
│  │            │            │  (Future)  │  (Future)     │    │
│  │  Metrics:  │ Dashboards:│            │               │    │
│  │  - SLOs    │ - Health   │            │               │    │
│  │  - Cache   │ - SLOs     │            │               │    │
│  │  - RAG     │ - Security │            │               │    │
│  │  - RBAC    │            │            │               │    │
│  └────────────┴────────────┴────────────┴───────────────┘    │
└─────────────────────────────────────────────────────────────────┘

Key Architectural Separation

Nextcloud is a separate stack, not part of VoiceAssist deployment.

Local Development:

MacBook Pro
├── ~/Nextcloud-Dev/                    # Separate Nextcloud Stack
│   ├── docker-compose.yml              # Nextcloud + DB
│   └── Running at: http://localhost:8080
│
└── ~/VoiceAssist/                      # VoiceAssist Stack
    ├── docker-compose.yml              # All VoiceAssist services
    └── Running at: http://localhost:8000
    └── Connects via: NEXTCLOUD_BASE_URL=http://localhost:8080

Integration Pattern:

  • VoiceAssist services are clients of Nextcloud
  • Communication via HTTP/HTTPS APIs (OIDC, WebDAV, CalDAV, CardDAV)
  • No shared Docker Compose project, no shared databases
  • Environment variables configure the connection

Architecture Principles

1. Progressive Complexity

Start Simple: Begin with monorepo for rapid development Maintain Boundaries: Enforce logical service boundaries even in monorepo Scale When Needed: Extract to microservices only when scaling requires it

Decision Matrix:

FactorMonorepo (Current)Microservices (Future)
Team Size< 5 developers> 5 developers
Concurrent Users< 50 users> 50 users
DeploymentSingle serverMulti-node K8s cluster
Development SpeedFaster (single codebase)Slower (coordination overhead)
Operational ComplexityLow (Docker Compose)High (K8s, service mesh)

2. Security by Design

  • Zero-trust model: Never trust, always verify
  • PHI protection: Never log PHI, automatic redaction
  • Least privilege: RBAC with granular permissions
  • Encryption everywhere: TLS in transit, encryption at rest
  • Audit everything: Immutable audit logs for all sensitive operations

3. Observability First

  • Metrics: Prometheus for performance and SLO tracking
  • Logs: Structured JSON with correlation IDs
  • Tracing: Request context propagation (future: OpenTelemetry)
  • Dashboards: Grafana for real-time system health
  • Alerts: Multi-window, multi-burn-rate SLO alerting

4. API-First Design

  • Standard envelope: Consistent response format across all endpoints
  • Error codes: Typed error codes for client error handling
  • Versioning: API version in URL path (/api/v1/...)
  • Documentation: OpenAPI/Swagger auto-generated from code

5. Performance Optimization

  • Multi-level caching: L1 (LRU in-memory) + L2 (Redis distributed)
  • Connection pooling: Efficient database and API client connections
  • Async processing: Background jobs for long-running tasks
  • Query optimization: Indexed database queries, vector search tuning

Current Implementation Status

Phase Completion Summary

All 16 project phases (0-15) are complete. See Implementation Status for detailed component status.

PhaseStatusKey Deliverables
Phase 0✅ CompleteProject structure, Docker Compose, base infrastructure
Phase 1✅ CompletePostgreSQL, Redis, Qdrant, health endpoints, Alembic migrations
Phase 2✅ CompleteJWT auth, password validation, token revocation, Nextcloud integration
Phase 3✅ CompleteAPI Gateway solidified, core endpoints, service boundaries
Phase 4✅ CompleteWebSocket realtime communication, QueryOrchestrator integration
Phase 5✅ CompleteRAG pipeline, semantic search, document ingestion, OpenAI embeddings
Phase 6✅ CompleteCalDAV calendar, WebDAV file indexing, email skeleton
Phase 7✅ CompleteRBAC enforcement, admin panel dashboard, smoke tests
Phase 8✅ CompleteDistributed tracing, observability infrastructure
Phase 9✅ CompleteInfrastructure as code, CI/CD pipelines
Phase 10✅ CompleteLoad testing, performance optimization
Phase 11✅ CompleteSecurity hardening, HIPAA compliance
Phase 12✅ CompleteHigh availability, disaster recovery
Phase 13✅ CompleteFinal testing, documentation
Phase 14✅ CompleteProduction deployment
Phase 15✅ CompleteFinal review and handoff

Completed Features

Authentication & Authorization:

  • ✅ User registration with password strength validation
  • ✅ JWT access tokens (15-min) + refresh tokens (7-day)
  • ✅ Token revocation via Redis (dual-level: individual + all-user)
  • ✅ Role-based access control (admin vs user)
  • ✅ Admin-only endpoints protected with get_current_admin_user dependency
  • ✅ Comprehensive audit logging (SHA-256 integrity verification)

Medical AI & Knowledge Base:

  • ✅ Document upload (PDF, TXT support)
  • ✅ Text extraction and intelligent chunking
  • ✅ OpenAI embeddings (text-embedding-3-small, 1536 dimensions)
  • ✅ Qdrant vector storage with cosine similarity
  • ✅ RAG pipeline with context retrieval and citation tracking
  • ✅ QueryOrchestrator with LLM integration
  • ✅ Streaming responses via WebSocket

Nextcloud Integration:

  • ✅ CalDAV calendar operations (list, create, update, delete events)
  • ✅ WebDAV file discovery and auto-indexing
  • ✅ Automatic knowledge base population from Nextcloud files
  • ✅ Duplicate prevention for re-indexing

Observability & Operations:

  • ✅ Prometheus metrics (cache, RAG, RBAC, HTTP, DB)
  • ✅ Multi-level caching with hit/miss tracking
  • ✅ SLO definitions (availability, latency, cache performance)
  • ✅ SLO recording rules and alerting (Prometheus)
  • ✅ Grafana dashboards (health, SLOs, security audit)
  • ✅ Admin panel dashboard with system summary

Infrastructure:

  • ✅ Docker Compose orchestration
  • ✅ PostgreSQL with pgvector extension
  • ✅ Redis with multiple databases (cache, queue, L2, token revocation)
  • ✅ Qdrant vector database
  • ✅ ARQ async job queue for background processing
  • ✅ Alembic database migrations

Future Enhancements (Optional)

The following features are candidates for future enhancement beyond the current implementation:

  • ⏳ OIDC authentication integration (Nextcloud SSO)
  • ⏳ Per-user credential management
  • ⏳ Complete email integration (threading, search, attachments)
  • ⏳ CardDAV contacts integration
  • ⏳ BioGPT/PubMedBERT specialized medical models
  • ⏳ Multi-hop reasoning and complex retrieval strategies
  • ⏳ External integrations (UpToDate, OpenEvidence, PubMed live APIs)
  • ⏳ Microservices extraction (when scaling requires)

Component Architecture

Monorepo Structure

VoiceAssist/
├── services/
│   └── api-gateway/              # Main FastAPI application
│       ├── app/
│       │   ├── main.py           # Application entry point
│       │   ├── api/              # API routes (FastAPI routers)
│       │   │   ├── auth.py       # Authentication endpoints
│       │   │   ├── users.py      # User management
│       │   │   ├── realtime.py   # WebSocket endpoint
│       │   │   ├── admin_kb.py   # Admin KB management
│       │   │   ├── admin_panel.py # Admin dashboard
│       │   │   ├── integrations.py # Nextcloud integrations
│       │   │   └── metrics.py    # Prometheus metrics
│       │   ├── services/         # Business logic layer
│       │   │   ├── rag_service.py         # QueryOrchestrator (RAG pipeline)
│       │   │   ├── llm_client.py          # LLM interface
│       │   │   ├── kb_indexer.py          # Document ingestion
│       │   │   ├── search_aggregator.py   # Semantic search
│       │   │   ├── cache_service.py       # Multi-level caching
│       │   │   ├── audit_service.py       # Audit logging
│       │   │   ├── caldav_service.py      # Calendar integration
│       │   │   ├── nextcloud_file_indexer.py # File auto-indexing
│       │   │   ├── email_service.py       # Email skeleton
│       │   │   └── token_revocation.py    # Token blacklist
│       │   ├── models/           # SQLAlchemy ORM models
│       │   │   ├── user.py
│       │   │   ├── session.py
│       │   │   ├── message.py
│       │   │   └── audit_log.py
│       │   ├── core/             # Core infrastructure
│       │   │   ├── config.py     # Settings (Pydantic)
│       │   │   ├── database.py   # DB session management
│       │   │   ├── security.py   # JWT, password hashing
│       │   │   ├── dependencies.py # FastAPI dependencies
│       │   │   ├── api_envelope.py # Standard response format
│       │   │   ├── metrics.py    # Prometheus metrics definitions
│       │   │   ├── request_id.py # Request correlation
│       │   │   └── password_validator.py # Password strength
│       │   └── worker/           # Background job processing
│       │       ├── tasks.py      # ARQ tasks (document indexing)
│       │       └── worker.py     # ARQ worker entrypoint
│       ├── tests/
│       │   ├── unit/
│       │   ├── integration/
│       │   └── e2e/              # End-to-end tests (Phase 7)
│       ├── alembic/              # Database migrations
│       ├── requirements.txt
│       └── Dockerfile
├── infrastructure/
│   └── observability/
│       ├── prometheus/
│       │   ├── prometheus.yml
│       │   └── rules/
│       │       ├── slo_recording_rules.yml
│       │       └── slo_alerts.yml
│       └── grafana/
│           └── dashboards/
│               ├── health-monitoring.json
│               ├── slo-overview.json
│               └── security-audit.json
├── docs/                         # Documentation
│   ├── UNIFIED_ARCHITECTURE.md   # This document
│   ├── SERVICE_CATALOG.md
│   ├── DATA_MODEL.md
│   ├── operations/
│   │   └── SLO_DEFINITIONS.md
│   └── testing/
│       └── E2E_TESTING_GUIDE.md
├── docker-compose.yml            # Service orchestration
├── .env                          # Environment configuration
└── PHASE_STATUS.md               # Development status

Logical Service Boundaries

Even in monorepo, services maintain strict boundaries:

ServiceModule LocationResponsibilityDependencies
Auth Serviceapp/api/auth.py + app/core/security.pyUser registration, login, JWT tokens, RBACPostgreSQL, Redis (token revocation)
Realtime Serviceapp/api/realtime.pyWebSocket endpoint, streaming responsesQueryOrchestrator, LLMClient
Voice Pipeline Serviceapp/services/voice_pipeline_service.pyThinker-Talker voice orchestrationThinkerService, TalkerService
Thinker Serviceapp/services/thinker_service.pyLLM processing with tool/RAG supportLLMClient, RAGService
Talker Serviceapp/services/talker_service.pyTTS audio generation (ElevenLabs)ElevenLabsService
RAG Serviceapp/services/rag_service.pyQuery orchestration, RAG pipelineSearchAggregator, LLMClient, Qdrant
KB Indexerapp/services/kb_indexer.pyDocument ingestion, chunking, embeddingOpenAI API, Qdrant, PostgreSQL
Search Aggregatorapp/services/search_aggregator.pySemantic search, citation extractionQdrant, CacheService
Cache Serviceapp/services/cache_service.pyMulti-level caching (L1 + L2)Redis
Admin Serviceapp/api/admin_kb.py + app/api/admin_panel.pySystem management, dashboardAll services (monitoring)
Integration Serviceapp/api/integrations.py + app/services/caldav_service.pyNextcloud integrationsNextcloud APIs (CalDAV, WebDAV)
Audit Serviceapp/services/audit_service.pyCompliance logging, integrity verificationPostgreSQL
Worker Serviceapp/worker/Async background jobsRedis (ARQ), KBIndexer

Voice Architecture: The Thinker-Talker pipeline is the primary voice implementation. See Voice Mode Pipeline for details.

Service Communication Patterns

Synchronous (Direct Function Calls in Monorepo):

  • API routes → Service layer
  • Service → Service (internal imports)
  • Service → Database (SQLAlchemy)
  • Service → External APIs (HTTP clients)

Asynchronous (Background Jobs via ARQ):

  • Document indexing jobs
  • File auto-indexing from Nextcloud
  • Future: Email sending, scheduled tasks

Future (Microservices - Phases 11-14):

  • HTTP/REST between services
  • gRPC for high-performance internal communication
  • Message queue (RabbitMQ/Kafka) for async events

Data Architecture

Database Schema

PostgreSQL Tables (Alembic managed):

-- User Management CREATE TABLE users ( id UUID PRIMARY KEY, email VARCHAR(255) UNIQUE NOT NULL, hashed_password VARCHAR(255) NOT NULL, is_active BOOLEAN DEFAULT TRUE, is_admin BOOLEAN DEFAULT FALSE, created_at TIMESTAMP NOT NULL, updated_at TIMESTAMP NOT NULL ); -- Session Management CREATE TABLE sessions ( id UUID PRIMARY KEY, user_id UUID REFERENCES users(id), created_at TIMESTAMP NOT NULL, last_activity TIMESTAMP NOT NULL ); -- Conversation Messages CREATE TABLE messages ( id UUID PRIMARY KEY, session_id UUID REFERENCES sessions(id), user_id UUID REFERENCES users(id), role VARCHAR(50) NOT NULL, -- user, assistant, system content TEXT NOT NULL, created_at TIMESTAMP NOT NULL ); -- Audit Logs (HIPAA Compliance) CREATE TABLE audit_logs ( id UUID PRIMARY KEY, user_id UUID REFERENCES users(id), action VARCHAR(100) NOT NULL, resource_type VARCHAR(100), resource_id VARCHAR(255), ip_address VARCHAR(45), user_agent TEXT, request_id VARCHAR(255), service_name VARCHAR(100), endpoint VARCHAR(255), status_code INTEGER, success BOOLEAN NOT NULL, error_message TEXT, metadata JSONB, integrity_hash VARCHAR(64) NOT NULL, -- SHA-256 created_at TIMESTAMP NOT NULL ); CREATE INDEX idx_audit_logs_user_id ON audit_logs(user_id); CREATE INDEX idx_audit_logs_action ON audit_logs(action); CREATE INDEX idx_audit_logs_created_at ON audit_logs(created_at); CREATE INDEX idx_audit_logs_resource ON audit_logs(resource_type, resource_id);

Redis Database Organization

Redis Databases (0-15):

DBPurposeTTLKeys
0General cachingVaries (15min-24h)cache:*, user:*
1ARQ job queueN/Aarq:*
2L2 cache (multi-level)Variescache:l2:*
3Token revocationToken expirytoken:revoked:*, user:revoked:*
15Test databaseN/A(cleared after tests)

Qdrant Vector Database

Collection: medical_knowledge

{ "collection_name": "medical_knowledge", "vectors": { "size": 1536, # OpenAI text-embedding-3-small "distance": "Cosine" }, "payload_schema": { "document_id": "keyword", "chunk_index": "integer", "source_type": "keyword", # textbook, journal, guideline, note "title": "text", "content": "text", "metadata": "json" } }

Data Flow Architecture

Document Ingestion Flow:

File Upload → KBIndexer →
  1. Text Extraction (PyPDF2/pdfplumber)
  2. Chunking (500 chars, 50 overlap)
  3. Embedding Generation (OpenAI API)
  4. Vector Storage (Qdrant)
  5. Metadata Storage (PostgreSQL - future)
  6. Cache Invalidation

RAG Query Flow:

User Query → QueryOrchestrator →
  1. Check L1 Cache (embedding)
  2. Check L2 Cache (embedding)
  3. Generate Embedding (OpenAI API)
  4. Store in Cache (L2 + L1)
  5. Vector Search (Qdrant)
  6. Format Context
  7. LLM Generation (OpenAI GPT-4)
  8. Citation Extraction
  9. Response Streaming (WebSocket)

Authentication Flow:

Login Request → Auth API →
  1. Validate Credentials (bcrypt)
  2. Generate JWT Tokens (access + refresh)
  3. Store Session (PostgreSQL)
  4. Audit Log (audit_logs table)
  5. Return Tokens

Integration Architecture

Nextcloud Integration Pattern

Architecture Decision: Nextcloud is a separate deployment, VoiceAssist is a client.

Integration Points:

  1. CalDAV (Calendar)

    • Protocol: CalDAV (RFC 4791)
    • Library: caldav Python library
    • Operations: List calendars, create/update/delete events
    • Location: app/services/caldav_service.py
  2. WebDAV (Files)

    • Protocol: WebDAV (RFC 4918)
    • Library: webdavclient3
    • Operations: Discover files, download for indexing
    • Location: app/services/nextcloud_file_indexer.py
  3. OIDC (Authentication - Future)

    • Protocol: OpenID Connect
    • Flow: Authorization code flow
    • Provider: Nextcloud OIDC app
    • Status: Deferred to Phase 8+

Environment Configuration:

# Nextcloud Connection NEXTCLOUD_BASE_URL=http://localhost:8080 # or https://cloud.asimo.io NEXTCLOUD_ADMIN_USER=admin NEXTCLOUD_ADMIN_PASSWORD=secure_password # CalDAV NEXTCLOUD_CALDAV_URL=${NEXTCLOUD_BASE_URL}/remote.php/dav/calendars # WebDAV NEXTCLOUD_WEBDAV_URL=${NEXTCLOUD_BASE_URL}/remote.php/dav/files # OIDC (Future) NEXTCLOUD_OIDC_ISSUER=${NEXTCLOUD_BASE_URL}/apps/oidc NEXTCLOUD_CLIENT_ID=voiceassist NEXTCLOUD_CLIENT_SECRET=<from_nextcloud>

External API Integrations

OpenAI API:

  • Embeddings: text-embedding-3-small (1536 dimensions)
  • LLM: gpt-4-turbo-preview (configurable)
  • Usage: Document embedding, RAG response generation
  • Rate limiting: Handled by OpenAI client

Future Integrations (Phases 8+):

  • PubMed E-utilities API (medical literature search)
  • UpToDate API (evidence-based clinical references)
  • OpenEvidence API (guideline summaries)
  • Medical calculator libraries

Security Architecture

Authentication & Authorization

JWT Token Strategy:

  • Access Token: 15-minute expiry, HS256 algorithm
  • Refresh Token: 7-day expiry, HS256 algorithm
  • Token Revocation: Redis-based blacklist (individual + all-user-tokens)
  • Claims: sub (user_id), email, role, exp, iat, type

Password Security:

  • Hashing: bcrypt via passlib
  • Validation: Multi-criteria (8+ chars, upper, lower, digit, special)
  • Strength Scoring: 0-100 scale with Weak/Medium/Strong classification
  • Common Password Rejection: Blocks password, 123456, qwerty, etc.

RBAC (Role-Based Access Control):

  • Roles: admin, user (more roles in future phases)
  • Admin Enforcement: get_current_admin_user dependency
  • Protected Endpoints:
    • /api/admin/kb/* - Knowledge base management
    • /api/admin/panel/* - System dashboard
    • /api/integrations/* - Nextcloud integrations

Audit Logging

Compliance Features:

  • Immutable Trail: SHA-256 integrity hash on each log entry
  • Comprehensive Metadata: User, action, resource, timestamp, IP, user agent
  • Request Correlation: Request ID for distributed tracing
  • Tamper Detection: Integrity verification queries
  • HIPAA Alignment: Meets audit trail requirements

Logged Events:

  • User registration, login, logout
  • Token refresh, token revocation
  • Password changes, failed authentication
  • Admin operations (KB management, system config)
  • Document access and modifications

Data Protection

Encryption:

  • In Transit: HTTPS/TLS 1.2+ (production)
  • At Rest: Database-level encryption (future: PostgreSQL transparent encryption)
  • Tokens: JWT with signed claims
  • Passwords: bcrypt hashing (cost factor: 12)

PHI Protection (Future):

  • PHI detection service (Phase 8+)
  • Automatic log redaction
  • Local vs cloud AI routing based on PHI presence
  • Separate encryption keys for PHI data

Network Security

Docker Compose Network Isolation:

networks: voiceassist_network: driver: bridge internal: false # API gateway needs external access voiceassist_internal: driver: bridge internal: true # Database layer isolated

Future (Kubernetes - Phases 11-14):

  • Network policies for pod-to-pod restrictions
  • Service mesh (Linkerd) for mTLS
  • Ingress controller with WAF (Web Application Firewall)

Deployment Architecture

Development Environment (Docker Compose)

Current Stack:

# docker-compose.yml services: # Application Services voiceassist-server: build: ./services/api-gateway ports: ["8000:8000"] depends_on: [postgres, redis, qdrant] voiceassist-worker: build: ./services/api-gateway command: ["python", "-m", "app.worker.worker"] depends_on: [redis] # Data Layer postgres: image: pgvector/pgvector:pg16 ports: ["5432:5432"] volumes: [postgres_data:/var/lib/postgresql/data] redis: image: redis:7-alpine ports: ["6379:6379"] volumes: [redis_data:/data] qdrant: image: qdrant/qdrant:latest ports: ["6333:6333"] volumes: [qdrant_data:/qdrant/storage] # Observability (Phase 7+) prometheus: image: prom/prometheus:latest ports: ["9090:9090"] volumes: - ./infrastructure/observability/prometheus:/etc/prometheus grafana: image: grafana/grafana:latest ports: ["3000:3000"] volumes: - ./infrastructure/observability/grafana:/etc/grafana

Resource Allocation:

  • PostgreSQL: 2 CPU, 4GB RAM
  • Redis: 1 CPU, 1GB RAM
  • Qdrant: 2 CPU, 4GB RAM
  • API Gateway: 2 CPU, 4GB RAM
  • Worker: 1 CPU, 2GB RAM

Production Deployment (Future - Kubernetes)

Planned Architecture (Phases 11-14):

Kubernetes Cluster
├── Ingress (voiceassist.asimo.io)
│   └── SSL Termination (Let's Encrypt)
├── Service Mesh (Linkerd)
│   └── mTLS between all services
├── Microservices (2-10 replicas each)
│   ├── API Gateway (Kong/Nginx)
│   ├── Auth Service
│   ├── Realtime Service
│   ├── RAG Service
│   ├── Admin Service
│   └── Integration Service
├── Data Layer
│   ├── PostgreSQL (Primary + 2 Read Replicas)
│   ├── Redis Cluster (3 masters, 3 replicas)
│   └── Qdrant (3 replicas)
└── Observability
    ├── Prometheus (HA pair)
    ├── Grafana
    ├── Jaeger (distributed tracing)
    └── Loki (log aggregation)

Observability Architecture

Metrics Collection (Prometheus)

Instrumentation:

  • HTTP Metrics: Request count, latency (p50, p95, p99), status codes
  • Cache Metrics: Hit/miss rates by layer (L1, L2), size, evictions
  • RAG Metrics: Query latency, embedding generation time, search results
  • RBAC Metrics: Protected endpoint access, admin operations
  • Database Metrics: Connection pool utilization, query latency
  • External API Metrics: OpenAI call latency, rate limits

Metrics Endpoint:

  • Location: GET /metrics
  • Format: Prometheus exposition format
  • Protection: Optional authentication (configurable)

Service Level Objectives (SLOs)

Defined SLOs (Phase 7):

SLOTargetError BudgetMeasurement Window
API Availability99.9%43.2 min/month30 days
API Latency (P95)< 500ms-5 minutes
RAG Query Success99%1% failures24 hours
Cache Hit Rate> 40%-1 hour
Database P95 Latency< 100ms-5 minutes

Prometheus Recording Rules:

# API Availability (30-day) - record: slo:api_availability:ratio_rate30d expr: | sum(rate(voiceassist_http_requests_total{status_code=~"2..|3.."}[30d])) / sum(rate(voiceassist_http_requests_total[30d])) # Error Budget Remaining - record: slo:error_budget_remaining:percent expr: | 100 * (1 - ((1 - slo:api_availability:ratio_rate30d) / 0.001))

Alerting:

  • Multi-window, multi-burn-rate approach (Google SRE guidelines)
  • Critical alerts: SLO violations (< 99.9% availability)
  • Warning alerts: Error budget burn rate > 14.4x
  • Info alerts: Informational notifications

Logging Strategy

Structured Logging:

logger.info("user_login_success", extra={ "user_id": user.id, "email": user.email, "ip_address": request.client.host, "request_id": request.state.request_id, "timestamp": datetime.utcnow().isoformat() })

Log Levels:

  • DEBUG: Development only (not in production)
  • INFO: Normal operations, audit events
  • WARNING: Potential issues, deprecated API usage
  • ERROR: Errors requiring attention
  • CRITICAL: Service failures

Log Aggregation (Future - Loki):

  • Centralized log storage
  • Full-text search
  • Log correlation by request ID
  • PHI redaction applied automatically

Dashboards (Grafana)

Implemented Dashboards (Phase 7):

  1. Health Monitoring Dashboard (health-monitoring.json)

    • System overview (CPU, memory, disk)
    • Service health status
    • Database connection pool
    • Redis memory usage
    • Qdrant storage
  2. SLO Overview Dashboard (slo-overview.json)

    • API availability (30d)
    • Error budget remaining
    • Error budget burn rate
    • API latency (P50, P95, P99)
    • Cache hit rates
  3. Security Audit Dashboard (security-audit.json)

    • Recent authentication events
    • Failed login attempts
    • Token revocations
    • Admin operations
    • Audit log integrity status

Data Flow Examples

Example 1: User Registration and Login

1. User Registration
   ├─> POST /api/auth/register {email, password}
   ├─> Password Validator: Check strength
   ├─> User Model: Create with bcrypt hash
   ├─> PostgreSQL: Insert into users table
   ├─> Audit Service: Log registration event
   └─> Response: {user_id, email}

2. User Login
   ├─> POST /api/auth/login {email, password}
   ├─> User Model: Query by email
   ├─> Security Service: Verify password (bcrypt)
   ├─> Token Service: Generate JWT tokens (access + refresh)
   ├─> Session Model: Create session record
   ├─> Audit Service: Log login event
   └─> Response: {access_token, refresh_token, user}

3. Authenticated Request
   ├─> GET /api/auth/me
   ├─> Header: Authorization: Bearer <access_token>
   ├─> Dependency: get_current_user
   ├─> Token Service: Decode and validate JWT
   ├─> Token Revocation: Check Redis blacklist
   ├─> User Model: Query user details
   └─> Response: {user}

Example 2: RAG Query with Caching

1. User Query via WebSocket
   ├─> WS /api/realtime/ws
   ├─> Client: {"type": "message", "content": "What is diabetic ketoacidosis?"}
   ├─> Realtime Service: Parse and validate
   └─> Forward to QueryOrchestrator

2. RAG Pipeline
   ├─> QueryOrchestrator: handle_query()
   ├─> SearchAggregator: generate_query_embedding()
   │   ├─> CacheService: Check L1 cache (LRU)
   │   ├─> CacheService: Check L2 cache (Redis)
   │   ├─> Cache Miss → OpenAI API: Create embedding
   │   └─> CacheService: Store in L2 + L1 (24h TTL)
   ├─> SearchAggregator: search() in Qdrant
   │   ├─> Qdrant: Cosine similarity search (top_k=5)
   │   └─> Return: List[SearchResult]
   ├─> SearchAggregator: format_context_for_rag()
   ├─> LLMClient: generate() with context
   │   └─> OpenAI API: GPT-4 generation
   └─> SearchAggregator: extract_citations()

3. Streaming Response
   ├─> Realtime Service: Stream response chunks
   │   ├─> Send: {"type": "message_start", "message_id": "..."}
   │   ├─> Send: {"type": "message_chunk", "content": "Diabetic..."}
   │   ├─> Send: {"type": "message_chunk", "content": " ketoacidosis..."}
   │   └─> Send: {"type": "message_complete", "citations": [...]}
   └─> Client: Receives streaming response

Example 3: Document Upload and Indexing

1. Admin Upload
   ├─> POST /api/admin/kb/documents
   ├─> Dependency: get_current_admin_user (RBAC check)
   ├─> File: multipart/form-data (PDF or TXT)
   └─> Forward to KBIndexer

2. Document Processing
   ├─> KBIndexer: index_pdf_document() or index_document()
   ├─> Text Extraction: PyPDF2 or pdfplumber
   ├─> Chunking: 500 chars, 50 overlap
   ├─> For each chunk:
   │   ├─> OpenAI API: Create embedding (1536 dims)
   │   ├─> Qdrant: Store vector with metadata
   │   │   └─> Payload: {document_id, chunk_index, title, content, source_type}
   │   └─> Metrics: Track chunks_indexed
   └─> Return: IndexingResult {document_id, chunks_indexed, success}

3. Response to Admin
   ├─> Success Envelope: {success: true, data: {...}}
   ├─> Cache Invalidation: Clear L1 + L2 caches
   ├─> Audit Log: Document upload event
   └─> Prometheus Metrics: Increment kb_documents_indexed_total

Example 4: Calendar Event Creation via Nextcloud

1. Create Event Request
   ├─> POST /api/integrations/calendar/events
   ├─> Dependency: get_current_user (authentication)
   ├─> Body: {summary, start, end, description, location}
   └─> Forward to CalDAVService

2. CalDAV Integration
   ├─> CalDAVService: create_event()
   ├─> Connect to Nextcloud CalDAV
   │   └─> URL: {NEXTCLOUD_BASE_URL}/remote.php/dav/calendars/{user}/default
   ├─> Create iCalendar event (vobject)
   │   └─> VEVENT with SUMMARY, DTSTART, DTEND, DESCRIPTION, LOCATION
   ├─> Save to Nextcloud calendar
   └─> Return: Event UID

3. Response
   ├─> Success Envelope: {success: true, data: {event_uid: "..."}}
   ├─> Future: Send notification to user
   └─> Audit Log: Calendar event created

Technology Stack

Backend

ComponentTechnologyVersionPurpose
LanguagePython3.11+Primary backend language
FrameworkFastAPI0.104+Async web framework
ORMSQLAlchemy2.0+Database ORM
MigrationsAlembic1.12+Database schema versioning
ValidationPydantic2.4+Data validation and settings
Authenticationpython-jose3.3+JWT token handling
Password Hashingpasslib1.7+bcrypt hashing
HTTP Clienthttpx0.25+Async HTTP requests
Job QueueARQ0.25+Async background jobs

Databases & Storage

ComponentTechnologyVersionPurpose
RDBMSPostgreSQL16Primary relational database
Vector Extensionpgvector0.5+Vector storage in PostgreSQL
Cache/QueueRedis7+Caching, sessions, job queue
Vector DBQdrant1.7+Semantic search

AI & ML

ComponentTechnologyPurpose
EmbeddingsOpenAI text-embedding-3-small1536-dim embeddings
LLMOpenAI GPT-4 TurboResponse generation
FutureBioGPT, PubMedBERTMedical-specific models

Integrations

ComponentTechnologyPurpose
Calendarcaldav (Python library)CalDAV protocol support
Fileswebdavclient3WebDAV protocol support
Emailimaplib, smtplibIMAP/SMTP (future)
PDF ProcessingPyPDF2, pdfplumberText extraction

Observability

ComponentTechnologyVersionPurpose
MetricsPrometheus2.47+Metrics collection
Metrics Clientprometheus-client0.19+Python instrumentation
DashboardsGrafana10.2+Visualization
Future: TracingJaeger-Distributed tracing
Future: LoggingLoki-Log aggregation

Infrastructure

ComponentTechnologyVersionPurpose
ContainerizationDocker24+Container runtime
OrchestrationDocker Compose2.23+Multi-container orchestration
Future: K8sKubernetes1.28+Production orchestration
Future: Service MeshLinkerd2.14+mTLS, observability

Architecture Evolution

Phase-by-Phase Evolution

Phase 0-1: Foundation

  • Docker Compose setup
  • PostgreSQL, Redis, Qdrant
  • Health endpoints
  • Database migrations

Phase 2-3: Security & Core Services

  • JWT authentication
  • Password validation and hashing
  • Token revocation
  • Nextcloud integration skeleton
  • API Gateway solidified
  • Core endpoint structure

Phase 4: Realtime Communication

  • WebSocket endpoint
  • QueryOrchestrator integration
  • Message streaming protocol
  • Ping/pong keepalive

Phase 5: Medical AI

  • Document ingestion (PDF, TXT)
  • OpenAI embeddings
  • Qdrant vector storage
  • RAG pipeline
  • Semantic search
  • Citation tracking

Phase 6: Nextcloud Integration

  • CalDAV calendar operations
  • WebDAV file discovery
  • Automatic file indexing
  • Email service skeleton

Phase 7: Admin & RBAC

  • Role-based access control
  • Admin-only endpoints
  • Admin dashboard API
  • Smoke tests for RBAC

Future Phases (8-14):

  • OIDC authentication
  • Complete email integration
  • Frontend apps (Web Client, Admin Panel UI)
  • Voice processing (Thinker-Talker pipeline; legacy Realtime API fallback)
  • Specialized medical models
  • Microservices extraction (if needed)
  • Kubernetes deployment
  • Service mesh (Linkerd)
  • Advanced observability (Jaeger, Loki)

Migration to Microservices (When Needed)

Trigger Conditions:

  • 50 concurrent users

  • Team size > 5 developers
  • Independent scaling requirements
  • Different deployment cycles
  • Regulatory requirements

Extraction Strategy:

  1. Phase 11: Prepare

    • Ensure clean module boundaries
    • Extract shared code to library
    • Define API contracts
    • Independent service tests
  2. Phase 12: Extract Services

    • Start with independent services (Search, PHI Detection)
    • Extract core services (Auth, RAG, Admin)
    • Extract shared services last (Integrations)
  3. Phase 13: Deploy to Kubernetes

    • Create Dockerfiles per service
    • Create K8s manifests (Deployments, Services, ConfigMaps, Secrets)
    • Set up service mesh (Linkerd)
    • Deploy to dev cluster, then production

Design Decisions and Trade-offs

1. Monorepo vs Microservices (Phases 0-10)

Decision: Start with monorepo, maintain logical service boundaries

Rationale:

  • Faster development iteration
  • Simpler debugging (single codebase)
  • Lower operational complexity
  • Easier testing (no distributed systems challenges)
  • Suitable for < 50 concurrent users

Trade-offs:

  • Pros: Speed, simplicity, shared dependencies
  • Cons: Single deployment unit, harder to scale independently
  • Mitigation: Clear module boundaries enable future extraction

2. JWT vs Session-Based Authentication

Decision: JWT with short-lived access tokens + refresh tokens

Rationale:

  • Stateless authentication (scales horizontally)
  • No server-side session storage required
  • Works well with SPAs and mobile apps
  • Industry standard for API authentication

Trade-offs:

  • Pros: Scalable, stateless, widely supported
  • Cons: Cannot revoke tokens without additional infrastructure
  • Mitigation: Redis-based token revocation blacklist

3. Multi-Level Caching (L1 + L2)

Decision: In-memory LRU cache (L1) + Redis distributed cache (L2)

Rationale:

  • L1 provides ultra-low latency for hot data
  • L2 provides distributed caching across instances
  • Automatic promotion from L2 to L1 on cache hits

Trade-offs:

  • Pros: Fast, distributed, high hit rate
  • Cons: More complex invalidation, cache consistency
  • Mitigation: TTLs on all cached data, explicit invalidation APIs

4. OpenAI Embeddings vs Self-Hosted Models

Decision: Use OpenAI text-embedding-3-small for MVP

Rationale:

  • High quality embeddings (1536 dimensions)
  • No infrastructure overhead
  • Fast API responses
  • Easy integration

Trade-offs:

  • Pros: Quality, speed, simplicity
  • Cons: External dependency, cost per API call, data privacy
  • Mitigation: Future migration to BioGPT/PubMedBERT for medical-specific embeddings

5. ARQ vs Celery for Background Jobs

Decision: ARQ (Async Redis Queue)

Rationale:

  • Simpler than Celery (no separate broker required)
  • Native async/await support
  • Lightweight, fast
  • Redis-backed (already using Redis)

Trade-offs:

  • Pros: Simple, async-native, fast
  • Cons: Less mature than Celery, fewer features
  • Mitigation: Sufficient for current needs, can migrate to Celery if needed

6. Docker Compose vs Kubernetes (Phases 0-10)

Decision: Docker Compose for development and initial production

Rationale:

  • Simple local development
  • Easy to understand and debug
  • Suitable for single-server deployment
  • Lower operational complexity

Trade-offs:

  • Pros: Simplicity, speed, low overhead
  • Cons: Limited scaling, no auto-healing, single point of failure
  • Mitigation: Migrate to Kubernetes when scaling requirements justify complexity

7. Nextcloud Separation vs Integrated Deployment

Decision: Nextcloud as separate stack, VoiceAssist as client

Rationale:

  • Nextcloud is complex, mature, independently managed
  • Allows using existing Nextcloud installations
  • Clear separation of concerns
  • Independent update cycles

Trade-offs:

  • Pros: Flexibility, clear boundaries, reuse existing infrastructure
  • Cons: More complex configuration, network dependency
  • Mitigation: Well-defined API contracts, robust error handling

8. Synchronous vs Asynchronous Service Communication

Decision: Synchronous (direct function calls) in monorepo, async (message queue) for long-running jobs

Rationale:

  • Synchronous is simpler and faster for request-response patterns
  • Async is better for fire-and-forget and long-running tasks
  • Most operations in VoiceAssist are request-response

Trade-offs:

  • Pros: Simple, fast, easy to debug
  • Cons: Tighter coupling, harder to scale independently
  • Mitigation: Clear service boundaries enable future async migration

Core Architecture:

Design Documents:

Operations:

Development:

Security & Compliance:


Document Version: 1.0 Last Updated: 2025-11-20 Maintained By: VoiceAssist Development Team Review Cycle: Updated after each major phase completion

Beginning of guide
End of guide