Docs / Raw

Model Version Pinning

Sourced from docs/voice/MODEL_VERSIONS.md

Edit on GitHub

Model Version Pinning

This document tracks all HuggingFace model revisions used in the VoiceAssist platform. Pinning model revisions is required for supply chain security (Bandit B615).

Current Pinned Versions

ModelModel IDRevision HashVersionLast Updated
Speaker Diarizationpyannote/speaker-diarization-3.1cb03e11cae0c1f3c75fd7be406b7f0bbf33cd28cv3.1.02024-12-04
Speaker Embeddingpyannote/embeddinga9b3e59b43ceb4a4b04fb82bc7a1c36da47fe18aLatest2024-12-04
PHI NER Modelroberta-base-phi-i2b2mainLocal2024-12-04
ML Classifier Tokenizerdistilbert-base-uncasedLocal pathN/A2024-12-04
Medical EmbeddingsSee medical_embedding_service.pyConfiguredv1.02024-12-04

Files with Model Loading

FileModels LoadedStatus
services/api-gateway/app/engines/clinical_engine/enhanced_phi_detector.pyPHI NERPinned
services/api-gateway/app/services/speaker_diarization_service.pyDiarization, EmbeddingPinned
services/api-gateway/app/engines/conversation_engine/ml_classifier.pyDistilBERT tokenizerLocal path (nosec)
services/api-gateway/app/services/medical_embedding_service.pyVarious medical embeddingsAlready pinned
services/api-gateway/app/services/medical_embeddings.pyVarious medical embeddingsAlready pinned

Updating Model Versions

When upgrading a model:

  1. Test the new version in a non-production environment
  2. Verify model performance meets quality thresholds
  3. Get the commit hash from HuggingFace Hub
  4. Update the revision in the corresponding configuration
  5. Update this document with the new hash and date
  6. Run security scan: bandit -r services/ --severity-level medium

Getting Revision Hashes

To find the current commit hash for a HuggingFace model:

# Using huggingface_hub CLI huggingface-cli repo info pyannote/speaker-diarization-3.1 # Or via Python from huggingface_hub import HfApi api = HfApi() info = api.repo_info("pyannote/speaker-diarization-3.1") print(info.sha) # Latest commit hash

Security Notes

  • B615: HuggingFace from_pretrained() without pinned revision is flagged
  • Local paths: Loading from local paths (nosec B615) is safe
  • Pinned revisions: Prevent malicious updates to model weights
  • Review process: All model updates require security review

Updated: 2024-12-04 Related: Post-v4.1 Roadmap

Beginning of guide
End of guide