AI Tools.

Search

voice activity detection

segmentation-3.0

Pyannote segmentation-3.0 is a speaker segmentation model for detecting speaker changes, overlapping speech, and voice activity in audio. It produces frame-level predictions used as input to the full speaker diarization pipeline. The model can also run standalone for voice activity detection or overlapped speech detection without the full diarization stack.

Last reviewed

Use cases

  • Voice activity detection to identify speech vs. non-speech regions
  • Speaker change detection as preprocessing for downstream diarization
  • Overlapping speech detection in multi-party conversations
  • Audio preprocessing to remove silence before ASR
  • Component in pyannote diarization pipeline

Pros

  • MIT license
  • Handles voice activity, speaker change, and overlapping speech in a single model
  • Can run standalone for VAD without the full diarization stack
  • State-of-the-art segmentation performance on pyannote benchmarks
  • Integrates directly with speaker-diarization-3.1

Cons

  • Requires HuggingFace token acceptance for download
  • Frame-level model output requires post-processing for usable timestamps
  • Overlapping speech detection accuracy degrades with more than 2 simultaneous speakers
  • Not designed for keyword spotting or speech content analysis
  • Performance varies with recording quality and background noise level

FAQ

What is segmentation-3.0 used for?

Voice activity detection to identify speech vs. non-speech regions. Speaker change detection as preprocessing for downstream diarization. Overlapping speech detection in multi-party conversations. Audio preprocessing to remove silence before ASR. Component in pyannote diarization pipeline.

Is segmentation-3.0 free to use?

segmentation-3.0 is an open-source model published on HuggingFace. License terms vary by model — check the model card for the specific license.

How do I run segmentation-3.0 locally?

Most HuggingFace models can be loaded with transformers or the appropriate framework library. See the model card for framework-specific instructions and hardware requirements.

Tags

pyannote-audiopytorchpyannotepyannote-audio-modelaudiovoicespeechspeakerspeaker-diarizationspeaker-change-detectionspeaker-segmentationvoice-activity-detectionoverlapped-speech-detectionresegmentationlicense:mitregion:us