Pyannote speaker-diarization-3.1 is a complete speaker diarization pipeline from pyannote.audio that answers 'who spoke when' in an audio recording. It segments audio into speaker-homogeneous regions, clusters them by speaker identity using embedding models, and outputs timestamped speaker labels. Used in meeting transcription, podcast editing, and call center analytics.
10,315,718 ↓ · 1,829 ♡
WhisperKit CoreML is a collection of Whisper speech recognition models exported to Apple's CoreML format by Argmax, enabling on-device ASR on Apple Silicon (iPhone, iPad, Mac) without network calls. The models run via the WhisperKit framework, which handles chunking, VAD, and decoding on-device. Designed for iOS/macOS applications requiring offline transcription.
10,290,048 ↓ · 172 ♡
Whisper Large-v3-Turbo is a distilled version of Whisper Large-v3, fine-tuned to achieve most of the large model's transcription accuracy at substantially lower inference cost. It supports over 99 languages and maintains the original model's multilingual ASR quality while requiring fewer decoder layers. MIT licensed and directly compatible with HuggingFace's whisper inference pipeline.
7,653,767 ↓ · 2,994 ♡
Whisper Large-v3 is OpenAI's full-size ASR model supporting 99+ languages, trained on 680,000 hours of multilingual audio. It delivers state-of-the-art transcription accuracy across languages at the cost of significant inference compute. Apache 2.0 licensed. The Large-v3-Turbo variant (a distilled version) provides similar quality at lower cost for most use cases.
4,984,608 ↓ · 5,654 ♡
A Russian-language ASR model fine-tuned from Facebook's wav2vec2-large-xlsr-53 (cross-lingual 53-language pre-training) on Mozilla Common Voice and Common Voice 6.0 Russian datasets. Produces Russian text transcriptions from audio using CTC decoding. Community-contributed under Apache 2.0.
4,567,759 ↓ · 74 ♡
MMS-300M-1130-forced-aligner is Meta's 300M parameter wav2vec2-based model fine-tuned for forced phoneme-level alignment across 1,130 languages. It takes audio and a text transcript as input and outputs word- or phoneme-level timestamps, enabling subtitle synchronization and linguistic documentation at scale. The CC-BY-NC-4.0 license restricts commercial deployment.
3,619,474 ↓ · 87 ♡
wav2vec2-large-xlsr-53-portuguese is a XLSR-53 model fine-tuned on Portuguese Common Voice data for automatic speech recognition using CTC decoding on 16kHz mono audio. It achieves competitive word error rates on both European and Brazilian Portuguese test sets. Part of the community XLSR fine-tuning effort from the 2021 HuggingFace strong speech event.
3,471,224 ↓ · 53 ♡
voice-activity-detection is an open-source automatic-speech-recognition model available on HuggingFace. Details are sourced from the public model registry.
3,113,633 ↓ · 232 ♡
speaker-diarization-community-1 is an open-source automatic-speech-recognition model available on HuggingFace. Details are sourced from the public model registry.
2,836,290 ↓ · 346 ♡
whisper-small is an open-source automatic-speech-recognition model available on HuggingFace. Details are sourced from the public model registry.
2,251,456 ↓ · 552 ♡
Qwen3-ASR-1.7B is an open-source automatic-speech-recognition model available on HuggingFace. Details are sourced from the public model registry.
1,898,223 ↓ · 784 ♡
whisper-base is an open-source automatic-speech-recognition model available on HuggingFace. Details are sourced from the public model registry.
1,726,597 ↓ · 268 ♡
wav2vec2-large-xlsr-53-polish is an open-source automatic-speech-recognition model available on HuggingFace. Details are sourced from the public model registry.
1,545,495 ↓ · 12 ♡
distil-large-v3 is an open-source automatic-speech-recognition model available on HuggingFace. Details are sourced from the public model registry.
1,279,209 ↓ · 376 ♡
wav2vec2-base-960h is an open-source automatic-speech-recognition model available on HuggingFace. Details are sourced from the public model registry.
1,235,473 ↓ · 396 ♡
mms-1b-all is an open-source automatic-speech-recognition model available on HuggingFace. Details are sourced from the public model registry.
1,158,651 ↓ · 198 ♡
faster-whisper-tiny.en is an open-source automatic-speech-recognition model available on HuggingFace. Details are sourced from the public model registry.
1,153,747 ↓ · 9 ♡
Voxtral-Mini-4B-Realtime-2602 is an open-source automatic-speech-recognition model available on HuggingFace. Details are sourced from the public model registry.
1,109,913 ↓ · 838 ♡
wav2vec2-large-xlsr-53-japanese is an open-source automatic-speech-recognition model available on HuggingFace. Details are sourced from the public model registry.
1,083,424 ↓ · 56 ♡
wav2vec2-large-xlsr-53-chinese-zh-cn is an open-source automatic-speech-recognition model available on HuggingFace. Details are sourced from the public model registry.
1,039,395 ↓ · 133 ♡
speaker-diarization is an open-source automatic-speech-recognition model available on HuggingFace. Details are sourced from the public model registry.
879,833 ↓ · 1,268 ♡
wav2vec2-xls-r-300m-cv7-turkish is an open-source automatic-speech-recognition model available on HuggingFace. Details are sourced from the public model registry.
846,821 ↓ · 14 ♡
faster-whisper-tiny is an open-source automatic-speech-recognition model available on HuggingFace. Details are sourced from the public model registry.
830,338 ↓ · 19 ♡
whisper-medium is an open-source automatic-speech-recognition model available on HuggingFace. Details are sourced from the public model registry.
825,694 ↓ · 284 ♡
faster-whisper-large-v3 is an open-source automatic-speech-recognition model available on HuggingFace. Details are sourced from the public model registry.
811,408 ↓ · 569 ♡
whisper-tiny is an open-source automatic-speech-recognition model available on HuggingFace. Details are sourced from the public model registry.
802,648 ↓ · 427 ♡
parakeet-ctc-1.1b is an open-source automatic-speech-recognition model available on HuggingFace. Details are sourced from the public model registry.
719,583 ↓ · 46 ♡
parakeet-tdt-0.6b-v3 is an open-source automatic-speech-recognition model available on HuggingFace. Details are sourced from the public model registry.
676,469 ↓ · 40 ♡
Wav2Vec2-large-xlsr-hindi is an open-source automatic-speech-recognition model available on HuggingFace. Details are sourced from the public model registry.
675,944 ↓ · 12 ♡
faster-whisper-base is an open-source automatic-speech-recognition model available on HuggingFace. Details are sourced from the public model registry.
643,938 ↓ · 23 ♡
overlapped-speech-detection is an open-source automatic-speech-recognition model available on HuggingFace. Details are sourced from the public model registry.
628,781 ↓ · 56 ♡
reverb-diarization-v1 is an open-source automatic-speech-recognition model available on HuggingFace. Details are sourced from the public model registry.
627,408 ↓ · 13 ♡
VibeVoice-ASR is an open-source automatic-speech-recognition model available on HuggingFace. Details are sourced from the public model registry.
625,932 ↓ · 1,114 ♡
wav2vec2-large-xlsr-korean is an open-source automatic-speech-recognition model available on HuggingFace. Details are sourced from the public model registry.
605,223 ↓ · 55 ♡
wav2vec2-large-xlsr-53-arabic is an open-source automatic-speech-recognition model available on HuggingFace. Details are sourced from the public model registry.
586,689 ↓ · 54 ♡
Qwen3-ASR-0.6B is an open-source automatic-speech-recognition model available on HuggingFace. Details are sourced from the public model registry.
542,556 ↓ · 285 ♡
speakerkit-pro is an open-source automatic-speech-recognition model available on HuggingFace. Details are sourced from the public model registry.
541,062 ↓ · 20 ♡
faster-whisper-small is an open-source automatic-speech-recognition model available on HuggingFace. Details are sourced from the public model registry.
503,535 ↓ · 31 ♡
w2v-xls-r-uk is an open-source automatic-speech-recognition model available on HuggingFace. Details are sourced from the public model registry.
494,768 ↓ · 8 ♡
wav2vec2-xlsr-53-espeak-cv-ft is an open-source automatic-speech-recognition model available on HuggingFace. Details are sourced from the public model registry.
471,866 ↓ · 49 ♡
wav2vec2-large-xlsr-53-th is an open-source automatic-speech-recognition model available on HuggingFace. Details are sourced from the public model registry.
450,914 ↓ · 27 ♡
Phi-4-multimodal-instruct is an open-source automatic-speech-recognition model available on HuggingFace. Details are sourced from the public model registry.
427,637 ↓ · 1,597 ♡
canary-1b-flash is an open-source automatic-speech-recognition model available on HuggingFace. Details are sourced from the public model registry.
422,833 ↓ · 271 ♡
parakeet-tdt-0.6b-v3-coreml is an open-source automatic-speech-recognition model available on HuggingFace. Details are sourced from the public model registry.
420,190 ↓ · 41 ♡
hubert-large-ls960-ft is an open-source automatic-speech-recognition model available on HuggingFace. Details are sourced from the public model registry.
419,417 ↓ · 76 ♡
Qwen3-ForcedAligner-0.6B is an open-source automatic-speech-recognition model available on HuggingFace. Details are sourced from the public model registry.
418,055 ↓ · 125 ♡
parakeet-tdt-0.6b-v2 is an open-source automatic-speech-recognition model available on HuggingFace. Details are sourced from the public model registry.
412,282 ↓ · 40 ♡
parakeet-tdt-0.6b-v3 is an open-source automatic-speech-recognition model available on HuggingFace. Details are sourced from the public model registry.
402,242 ↓ · 824 ♡
wav2vec2-lv-60-espeak-cv-ft is an open-source automatic-speech-recognition model available on HuggingFace. Details are sourced from the public model registry.
395,202 ↓ · 67 ♡
wav2vec2-large-xlsr-53-dutch is an open-source automatic-speech-recognition model available on HuggingFace. Details are sourced from the public model registry.
380,163 ↓ · 14 ♡
parakeetkit-pro is an open-source automatic-speech-recognition model available on HuggingFace. Details are sourced from the public model registry.
375,705 ↓ · 4 ♡
granite-speech-3.3-2b is an open-source automatic-speech-recognition model available on HuggingFace. Details are sourced from the public model registry.
356,666 ↓ · 54 ♡
speaker-diarization-3.0 is an open-source automatic-speech-recognition model available on HuggingFace. Details are sourced from the public model registry.
340,211 ↓ · 215 ♡
wav2vec2-indonesian-javanese-sundanese is an open-source automatic-speech-recognition model available on HuggingFace. Details are sourced from the public model registry.
340,026 ↓ · 12 ♡
T-one is an open-source automatic-speech-recognition model available on HuggingFace. Details are sourced from the public model registry.
293,020 ↓ · 90 ♡
wav2vec2-large-xlsr-53-greek is an open-source automatic-speech-recognition model available on HuggingFace. Details are sourced from the public model registry.
292,912 ↓ · 3 ♡
nb-wav2vec2-1b-nynorsk is an open-source automatic-speech-recognition model available on HuggingFace. Details are sourced from the public model registry.
284,389 ↓ · 0 ♡