2:I[7012,["4765","static/chunks/4765-f5afdf8061f456f3.js","9856","static/chunks/9856-3b185291364d9bef.js","6687","static/chunks/app/docs/%5B...slug%5D/page-e07536548216bee4.js"],"MarkdownRenderer"] 4:I[9856,["4765","static/chunks/4765-f5afdf8061f456f3.js","9856","static/chunks/9856-3b185291364d9bef.js","6687","static/chunks/app/docs/%5B...slug%5D/page-e07536548216bee4.js"],""] 5:I[4126,[],""] 7:I[9630,[],""] 8:I[4278,["9856","static/chunks/9856-3b185291364d9bef.js","8172","static/chunks/8172-b3a2d6fe4ae10d40.js","3185","static/chunks/app/layout-2814fa5d15b84fe4.js"],"HeadingProvider"] 9:I[1476,["9856","static/chunks/9856-3b185291364d9bef.js","8172","static/chunks/8172-b3a2d6fe4ae10d40.js","3185","static/chunks/app/layout-2814fa5d15b84fe4.js"],"Header"] a:I[3167,["9856","static/chunks/9856-3b185291364d9bef.js","8172","static/chunks/8172-b3a2d6fe4ae10d40.js","3185","static/chunks/app/layout-2814fa5d15b84fe4.js"],"Sidebar"] b:I[7409,["9856","static/chunks/9856-3b185291364d9bef.js","8172","static/chunks/8172-b3a2d6fe4ae10d40.js","3185","static/chunks/app/layout-2814fa5d15b84fe4.js"],"PageFrame"] 3:T3f1d, > **⚠️ LEGACY V1 DOCUMENT – NOT CANONICAL FOR V2** > This describes the original 20-phase plan. > For the current architecture and phases, see: > > - [ARCHITECTURE_V2.md](ARCHITECTURE_V2.md) > - [DEVELOPMENT_PHASES_V2.md](DEVELOPMENT_PHASES_V2.md) > - [START_HERE.md](START_HERE.md) > - [Implementation Status](overview/IMPLEMENTATION_STATUS.md) # VoiceAssist Architecture ## System Overview VoiceAssist uses a distributed architecture with components running on macOS (client), Ubuntu server (backend services), and accessible via web interfaces. ## Architecture Diagram ``` ┌─────────────────────────────────────────────────────────────┐ │ macOS Client (Local) │ │ │ │ ┌─────────────────┐ ┌──────────────────┐ │ │ │ Voice Interface│ │ System Services │ │ │ │ - Wake word │ │ - Calendar │ │ │ │ - Realtime API │ │ - Email │ │ │ │ - Audio stream │ │ - Files │ │ │ └────────┬────────┘ │ - Reminders │ │ │ │ └──────────────────┘ │ │ │ │ │ ┌────────┴──────────────────────────────────┐ │ │ │ AI Orchestrator (Python) │ │ │ │ - Request routing │ │ │ │ - Privacy classifier │ │ │ │ - Context management │ │ │ └────────┬──────────────┬────────────────────┘ │ │ │ │ │ │ ┌────────┴────────┐ ┌──┴──────────────┐ │ │ │ Local LLM │ │ File Indexer │ │ │ │ (Ollama) │ │ - Vector search│ │ │ │ - PHI queries │ │ - Local docs │ │ │ └─────────────────┘ └─────────────────┘ │ └───────────────────────────────┬─────────────────────────────┘ │ Secure HTTPS (asimo.io) │ ┌───────────────────────────────┴─────────────────────────────┐ │ Ubuntu Server (asimo.io) │ │ │ │ ┌────────────────────────────────────────────────────┐ │ │ │ API Gateway (Nginx) │ │ │ └─────┬──────────────┬───────────────┬───────────────┘ │ │ │ │ │ │ │ ┌─────┴──────┐ ┌────┴─────┐ ┌─────┴──────────┐ │ │ │Voice API │ │Medical KB│ │Admin API │ │ │ │Service │ │Service │ │Service │ │ │ └────────────┘ └──────────┘ └────────────────┘ │ │ │ │ ┌──────────────────────────────────────────────────────┐ │ │ │ Medical Knowledge Base │ │ │ │ ┌────────────────┐ ┌─────────────────────────┐ │ │ │ │ │ Vector DB │ │ PDF Processing │ │ │ │ │ │ (Qdrant) │ │ - Download │ │ │ │ │ │ - Textbooks │ │ - OCR │ │ │ │ │ │ - Guidelines │ │ - Indexing │ │ │ │ │ │ - Journals │ │ - Storage │ │ │ │ │ └────────────────┘ └─────────────────────────┘ │ │ │ └──────────────────────────────────────────────────────┘ │ │ │ │ ┌──────────────────────────────────────────────────────┐ │ │ │ External Services Integration │ │ │ │ - PubMed API │ │ │ │ - OpenEvidence API │ │ │ │ - Nextcloud WebDAV │ │ │ │ - Web scraping service │ │ │ └──────────────────────────────────────────────────────┘ │ │ │ │ ┌──────────────────────────────────────────────────────┐ │ │ │ Data Storage │ │ │ │ - PostgreSQL (metadata, users, logs) │ │ │ │ - Redis (caching, sessions) │ │ │ │ - File storage (PDFs, documents) │ │ │ └──────────────────────────────────────────────────────┘ │ └───────────────────────────────┬─────────────────────────────┘ │ HTTPS/WebSocket │ ┌───────────────────────────────┴─────────────────────────────┐ │ Web Clients │ │ │ │ ┌─────────────────┐ ┌──────────────┐ ┌────────────────┐ │ │ │ Web App │ │ Admin Panel │ │ Docs Site │ │ │ │ (React) │ │ (React) │ │ (Next.js) │ │ │ │ - Voice/Text │ │ - Config │ │ - Guides │ │ │ │ - Chat UI │ │ - Analytics │ │ - API docs │ │ │ └─────────────────┘ └──────────────┘ └────────────────┘ │ └─────────────────────────────────────────────────────────────┘ ``` ## Component Details ### 1. macOS Client **Voice Interface** - Continuous audio monitoring with wake word detection (Porcupine) - Streams to OpenAI Realtime API when activated - Low-latency speech-to-speech conversation - Handles interruptions and natural conversation flow **AI Orchestrator** - Routes requests based on privacy classification - Manages conversation context and history - Coordinates between local and cloud models - Implements tool calling for system actions **Local Processing** - Ollama for local LLM inference - Vector search over local files - System integration via AppleScript/shortcuts - File system indexing and search **Implementation**: Python daemon + Swift UI (or Electron) ### 2. Ubuntu Server Services #### Voice API Service - WebSocket endpoint for web clients - Proxy to OpenAI Realtime API - Session management - Authentication and authorization #### Medical Knowledge Base Service - RAG (Retrieval Augmented Generation) pipeline - Vector similarity search - Source citation and metadata tracking - Periodic knowledge base updates **APIs:** - `POST /search` - Search medical knowledge - `GET /textbook/{id}/section/{section}` - Retrieve textbook content - `POST /journal/search` - Search medical journals - `POST /journal/download` - Download and process PDF #### Admin API Service - System configuration endpoints - User management - Usage analytics - Model selection and settings - Integration testing #### PDF Processing Pipeline 1. Download from PubMed, direct links, or upload 2. Extract text (PyPDF2, pdfplumber) 3. OCR if needed (Tesseract) 4. Chunk content intelligently (by section/paragraph) 5. Generate embeddings (OpenAI embeddings or local model) 6. Store in vector DB with metadata 7. Index in PostgreSQL for traditional search #### External Service Integrations **PubMed API** - Search via E-utilities - Download abstracts and metadata - Full-text retrieval from PMC **OpenEvidence API** - Evidence summary queries - Clinical question answering - Guideline recommendations **Nextcloud Integration** - WebDAV for file access - Automatic indexing of medical notes - Document backup and sync ### 3. Web Application **Frontend (React + TypeScript)** - Chat interface with voice input option - File upload for analysis - Source citation display - Conversation history - Mobile-responsive design **Features:** - Text and voice input modes - Real-time streaming responses - Code/markdown rendering - File attachments - Export conversations **Communication:** - WebSocket for real-time chat - REST API for file operations - Audio streaming for voice mode ### 4. Admin Panel **Dashboard Sections:** 1. **System Overview** - Active sessions - Resource usage (CPU, memory, GPU) - API quota usage - Error rates 2. **Configuration** - Model selection (local vs cloud) - API keys management - System integrations on/off - Privacy settings 3. **Knowledge Base Management** - Upload medical textbooks - View indexed documents - Trigger re-indexing - Delete outdated content 4. **User Management** - Access control (if multi-user later) - Usage limits - Audit logs 5. **Analytics** - Query patterns - Popular topics - Response times - Cost analysis (API usage) ### 5. Documentation Site **Content Structure:** - Getting started guide - Installation instructions - User manual - Medical features guide - API documentation (if exposing APIs) - Troubleshooting - Architecture diagrams **Implementation**: Next.js with MDX or Docusaurus ## Data Flow Examples ### Example 1: Voice Query with Local Processing ``` 1. User speaks: "What's on my calendar today?" 2. Wake word detected → activate Realtime API 3. Speech streamed to OpenAI → transcribed 4. Orchestrator classifies: LOCAL (calendar is system access) 5. Python script calls macOS Calendar via AppleScript 6. Response generated by local Ollama model 7. TTS via OpenAI → played to user ``` ### Example 2: Medical Literature Query ``` 1. User: "Find recent papers on GLP-1 agonists for heart failure" 2. Orchestrator classifies: CLOUD (medical research, no PHI) 3. Request sent to Ubuntu server medical-kb service 4. Service queries PubMed API 5. Downloads relevant PDFs from PMC 6. OCR/extract text → generate embeddings 7. Store in vector DB 8. Generate summary with GPT-4 9. Return response with citations 10. Display in UI with PDF links ``` ### Example 3: Medical Textbook Query ``` 1. User: "What does Harrison's say about diabetic ketoacidosis?" 2. Orchestrator classifies: HYBRID 3. Query vector DB for relevant textbook sections 4. Retrieve top 5 matching chunks with metadata 5. Send chunks + query to GPT-4 for synthesis 6. Response includes: "According to Harrison's, Chapter 420, page 2987..." 7. Return with page references and option to read more ``` ## Privacy Architecture ### Data Classification **Tier 1 - Strictly Local (PHI/Sensitive)** - Patient notes - Personal medical records - Sensitive personal files - Never sent to external APIs - Processed by local Ollama only **Tier 2 - Server (Private but not PHI)** - Personal documents - Email content - Calendar details - Stored on Ubuntu server - Not sent to commercial APIs **Tier 3 - Cloud OK (Public/General Knowledge)** - Medical literature queries - General medical questions - Web searches - Can use OpenAI/Claude APIs ### Classification Logic - Keyword detection (patient names, MRN, etc.) - File path analysis (/Medical-Records/\* = local) - User tagging (mark conversations as sensitive) - Default: assume Tier 1 unless explicitly cleared ## Security Considerations 1. **Authentication** - API key auth for server communication - OAuth for web clients (optional multi-user) - mTLS for macOS client ↔ server 2. **Encryption** - HTTPS/WSS for all network communication - Encrypted storage for sensitive data - Encrypted backups to Nextcloud 3. **Access Control** - File system permissions - API rate limiting - Audit logging 4. **HIPAA Considerations** - Business Associate Agreements needed if using OpenAI with PHI - Current design: never send PHI to OpenAI - Document data handling policies ## Scalability Considerations **Current Design**: Single-user, personal use **Future Expansion Possibilities**: - Multi-user support (family members, colleagues) - Horizontal scaling of server services - Multiple macOS/iOS clients - Shared knowledge base with privacy isolation - Team collaboration features ## Deployment Architecture ### macOS Client - LaunchAgent for auto-start - Menu bar app - System permissions (microphone, accessibility) - Auto-update mechanism ### Ubuntu Server - Docker Compose for service orchestration - Nginx reverse proxy - Let's Encrypt SSL certificates - Systemd for service management - Automated backups ### Monitoring - Prometheus + Grafana for metrics - Log aggregation (Loki or ELK) - Alerting (if server issues) - Usage tracking (anonymized) ## Technology Choices Rationale **FastAPI**: Modern, fast, async Python framework with automatic API docs **PostgreSQL + pgvector**: Mature relational DB with vector extension **Qdrant/Weaviate**: Purpose-built vector databases for semantic search **React**: Popular, well-documented, large ecosystem **Ollama**: Simple local LLM deployment, supports many models **OpenAI Realtime API**: Best-in-class voice interface, low latency **Docker**: Consistent deployment, easy service isolation 6:["slug","ARCHITECTURE","c"] 0:["X7oMT3VrOffzp0qvbeOas",[[["",{"children":["docs",{"children":[["slug","ARCHITECTURE","c"],{"children":["__PAGE__?{\"slug\":[\"ARCHITECTURE\"]}",{}]}]}]},"$undefined","$undefined",true],["",{"children":["docs",{"children":[["slug","ARCHITECTURE","c"],{"children":["__PAGE__",{},[["$L1",["$","div",null,{"children":[["$","div",null,{"className":"mb-6 flex items-center justify-between gap-4","children":[["$","div",null,{"children":[["$","p",null,{"className":"text-sm text-gray-500 dark:text-gray-400","children":"Docs / Raw"}],["$","h1",null,{"className":"text-3xl font-bold text-gray-900 dark:text-white","children":"Architecture"}],["$","p",null,{"className":"text-sm text-gray-600 dark:text-gray-400","children":["Sourced from"," ",["$","code",null,{"className":"font-mono text-xs","children":["docs/","ARCHITECTURE.md"]}]]}]]}],["$","a",null,{"href":"https://github.com/mohammednazmy/VoiceAssist/edit/main/docs/ARCHITECTURE.md","target":"_blank","rel":"noreferrer","className":"inline-flex items-center gap-2 rounded-md border border-gray-200 dark:border-gray-700 px-3 py-1.5 text-sm text-gray-700 dark:text-gray-200 hover:border-primary-500 dark:hover:border-primary-400 hover:text-primary-700 dark:hover:text-primary-300","children":"Edit on GitHub"}]]}],["$","div",null,{"className":"rounded-lg border border-gray-200 dark:border-gray-800 bg-white dark:bg-gray-900 p-6","children":["$","$L2",null,{"content":"$3"}]}],["$","div",null,{"className":"mt-6 flex flex-wrap gap-2 text-sm","children":[["$","$L4",null,{"href":"/reference/all-docs","className":"inline-flex items-center gap-1 rounded-md bg-gray-100 px-3 py-1 text-gray-700 hover:bg-gray-200 dark:bg-gray-800 dark:text-gray-200 dark:hover:bg-gray-700","children":"← All documentation"}],["$","$L4",null,{"href":"/","className":"inline-flex items-center gap-1 rounded-md bg-gray-100 px-3 py-1 text-gray-700 hover:bg-gray-200 dark:bg-gray-800 dark:text-gray-200 dark:hover:bg-gray-700","children":"Home"}]]}]]}],null],null],null]},[null,["$","$L5",null,{"parallelRouterKey":"children","segmentPath":["children","docs","children","$6","children"],"error":"$undefined","errorStyles":"$undefined","errorScripts":"$undefined","template":["$","$L7",null,{}],"templateStyles":"$undefined","templateScripts":"$undefined","notFound":"$undefined","notFoundStyles":"$undefined"}]],null]},[null,["$","$L5",null,{"parallelRouterKey":"children","segmentPath":["children","docs","children"],"error":"$undefined","errorStyles":"$undefined","errorScripts":"$undefined","template":["$","$L7",null,{}],"templateStyles":"$undefined","templateScripts":"$undefined","notFound":"$undefined","notFoundStyles":"$undefined"}]],null]},[[[["$","link","0",{"rel":"stylesheet","href":"/_next/static/css/7f586cdbbaa33ff7.css","precedence":"next","crossOrigin":"$undefined"}]],["$","html",null,{"lang":"en","className":"h-full","children":["$","body",null,{"className":"__className_f367f3 h-full bg-white dark:bg-gray-900","children":[["$","a",null,{"href":"#main-content","className":"skip-to-content","children":"Skip to main content"}],["$","$L8",null,{"children":[["$","$L9",null,{}],["$","$La",null,{}],["$","main",null,{"id":"main-content","className":"lg:pl-64","role":"main","aria-label":"Documentation content","children":["$","$Lb",null,{"children":["$","$L5",null,{"parallelRouterKey":"children","segmentPath":["children"],"error":"$undefined","errorStyles":"$undefined","errorScripts":"$undefined","template":["$","$L7",null,{}],"templateStyles":"$undefined","templateScripts":"$undefined","notFound":[["$","title",null,{"children":"404: This page could not be found."}],["$","div",null,{"style":{"fontFamily":"system-ui,\"Segoe UI\",Roboto,Helvetica,Arial,sans-serif,\"Apple Color Emoji\",\"Segoe UI Emoji\"","height":"100vh","textAlign":"center","display":"flex","flexDirection":"column","alignItems":"center","justifyContent":"center"},"children":["$","div",null,{"children":[["$","style",null,{"dangerouslySetInnerHTML":{"__html":"body{color:#000;background:#fff;margin:0}.next-error-h1{border-right:1px solid rgba(0,0,0,.3)}@media (prefers-color-scheme:dark){body{color:#fff;background:#000}.next-error-h1{border-right:1px solid rgba(255,255,255,.3)}}"}}],["$","h1",null,{"className":"next-error-h1","style":{"display":"inline-block","margin":"0 20px 0 0","padding":"0 23px 0 0","fontSize":24,"fontWeight":500,"verticalAlign":"top","lineHeight":"49px"},"children":"404"}],["$","div",null,{"style":{"display":"inline-block"},"children":["$","h2",null,{"style":{"fontSize":14,"fontWeight":400,"lineHeight":"49px","margin":0},"children":"This page could not be found."}]}]]}]}]],"notFoundStyles":[]}]}]}]]}]]}]}]],null],null],["$Lc",null]]]] c:[["$","meta","0",{"name":"viewport","content":"width=device-width, initial-scale=1"}],["$","meta","1",{"charSet":"utf-8"}],["$","title","2",{"children":"Architecture | Docs | VoiceAssist Docs"}],["$","meta","3",{"name":"description","content":"> **⚠️ LEGACY V1 DOCUMENT – NOT CANONICAL FOR V2**"}],["$","meta","4",{"name":"keywords","content":"VoiceAssist,documentation,medical AI,voice assistant,healthcare,HIPAA,API"}],["$","meta","5",{"name":"robots","content":"index, follow"}],["$","meta","6",{"name":"googlebot","content":"index, follow"}],["$","link","7",{"rel":"canonical","href":"https://assistdocs.asimo.io"}],["$","meta","8",{"property":"og:title","content":"VoiceAssist Documentation"}],["$","meta","9",{"property":"og:description","content":"Comprehensive documentation for VoiceAssist - Enterprise Medical AI Assistant"}],["$","meta","10",{"property":"og:url","content":"https://assistdocs.asimo.io"}],["$","meta","11",{"property":"og:site_name","content":"VoiceAssist Docs"}],["$","meta","12",{"property":"og:type","content":"website"}],["$","meta","13",{"name":"twitter:card","content":"summary"}],["$","meta","14",{"name":"twitter:title","content":"VoiceAssist Documentation"}],["$","meta","15",{"name":"twitter:description","content":"Comprehensive documentation for VoiceAssist - Enterprise Medical AI Assistant"}],["$","meta","16",{"name":"next-size-adjust"}]] 1:null