AI Tools.

Search

automatic speech recognition models

57 models · ranked by HuggingFace downloads

speaker-diarization-3.1

Pyannote speaker-diarization-3.1 is a complete speaker diarization pipeline from pyannote.audio that answers 'who spoke when' in an audio recording. It segments audio into speaker-homogeneous regions, clusters them by speaker identity using embedding models, and outputs timestamped speaker labels. Used in meeting transcription, podcast editing, and call center analytics.

10,315,718 ↓ · 1,829 ♡

whisperkit-coreml

WhisperKit CoreML is a collection of Whisper speech recognition models exported to Apple's CoreML format by Argmax, enabling on-device ASR on Apple Silicon (iPhone, iPad, Mac) without network calls. The models run via the WhisperKit framework, which handles chunking, VAD, and decoding on-device. Designed for iOS/macOS applications requiring offline transcription.

10,290,048 ↓ · 172 ♡

whisper-large-v3-turbo

Whisper Large-v3-Turbo is a distilled version of Whisper Large-v3, fine-tuned to achieve most of the large model's transcription accuracy at substantially lower inference cost. It supports over 99 languages and maintains the original model's multilingual ASR quality while requiring fewer decoder layers. MIT licensed and directly compatible with HuggingFace's whisper inference pipeline.

7,653,767 ↓ · 2,994 ♡

whisper-large-v3

Whisper Large-v3 is OpenAI's full-size ASR model supporting 99+ languages, trained on 680,000 hours of multilingual audio. It delivers state-of-the-art transcription accuracy across languages at the cost of significant inference compute. Apache 2.0 licensed. The Large-v3-Turbo variant (a distilled version) provides similar quality at lower cost for most use cases.

4,984,608 ↓ · 5,654 ♡

wav2vec2-large-xlsr-53-russian

A Russian-language ASR model fine-tuned from Facebook's wav2vec2-large-xlsr-53 (cross-lingual 53-language pre-training) on Mozilla Common Voice and Common Voice 6.0 Russian datasets. Produces Russian text transcriptions from audio using CTC decoding. Community-contributed under Apache 2.0.

4,567,759 ↓ · 74 ♡

mms-300m-1130-forced-aligner

MMS-300M-1130-forced-aligner is Meta's 300M parameter wav2vec2-based model fine-tuned for forced phoneme-level alignment across 1,130 languages. It takes audio and a text transcript as input and outputs word- or phoneme-level timestamps, enabling subtitle synchronization and linguistic documentation at scale. The CC-BY-NC-4.0 license restricts commercial deployment.

3,619,474 ↓ · 87 ♡

wav2vec2-large-xlsr-53-portuguese

wav2vec2-large-xlsr-53-portuguese is a XLSR-53 model fine-tuned on Portuguese Common Voice data for automatic speech recognition using CTC decoding on 16kHz mono audio. It achieves competitive word error rates on both European and Brazilian Portuguese test sets. Part of the community XLSR fine-tuning effort from the 2021 HuggingFace strong speech event.

3,471,224 ↓ · 53 ♡

voice-activity-detection

voice-activity-detection is an open-source automatic-speech-recognition model available on HuggingFace. Details are sourced from the public model registry.

3,113,633 ↓ · 232 ♡

speaker-diarization-community-1

speaker-diarization-community-1 is an open-source automatic-speech-recognition model available on HuggingFace. Details are sourced from the public model registry.

2,836,290 ↓ · 346 ♡

whisper-small

whisper-small is an open-source automatic-speech-recognition model available on HuggingFace. Details are sourced from the public model registry.

2,251,456 ↓ · 552 ♡

Qwen3-ASR-1.7B

Qwen3-ASR-1.7B is an open-source automatic-speech-recognition model available on HuggingFace. Details are sourced from the public model registry.

1,898,223 ↓ · 784 ♡

whisper-base

whisper-base is an open-source automatic-speech-recognition model available on HuggingFace. Details are sourced from the public model registry.

1,726,597 ↓ · 268 ♡

wav2vec2-large-xlsr-53-polish

wav2vec2-large-xlsr-53-polish is an open-source automatic-speech-recognition model available on HuggingFace. Details are sourced from the public model registry.

1,545,495 ↓ · 12 ♡

distil-large-v3

distil-large-v3 is an open-source automatic-speech-recognition model available on HuggingFace. Details are sourced from the public model registry.

1,279,209 ↓ · 376 ♡

wav2vec2-base-960h

wav2vec2-base-960h is an open-source automatic-speech-recognition model available on HuggingFace. Details are sourced from the public model registry.

1,235,473 ↓ · 396 ♡

mms-1b-all

mms-1b-all is an open-source automatic-speech-recognition model available on HuggingFace. Details are sourced from the public model registry.

1,158,651 ↓ · 198 ♡

faster-whisper-tiny.en

faster-whisper-tiny.en is an open-source automatic-speech-recognition model available on HuggingFace. Details are sourced from the public model registry.

1,153,747 ↓ · 9 ♡

Voxtral-Mini-4B-Realtime-2602

Voxtral-Mini-4B-Realtime-2602 is an open-source automatic-speech-recognition model available on HuggingFace. Details are sourced from the public model registry.

1,109,913 ↓ · 838 ♡

wav2vec2-large-xlsr-53-japanese

wav2vec2-large-xlsr-53-japanese is an open-source automatic-speech-recognition model available on HuggingFace. Details are sourced from the public model registry.

1,083,424 ↓ · 56 ♡

wav2vec2-large-xlsr-53-chinese-zh-cn

wav2vec2-large-xlsr-53-chinese-zh-cn is an open-source automatic-speech-recognition model available on HuggingFace. Details are sourced from the public model registry.

1,039,395 ↓ · 133 ♡

speaker-diarization

speaker-diarization is an open-source automatic-speech-recognition model available on HuggingFace. Details are sourced from the public model registry.

879,833 ↓ · 1,268 ♡

wav2vec2-xls-r-300m-cv7-turkish

wav2vec2-xls-r-300m-cv7-turkish is an open-source automatic-speech-recognition model available on HuggingFace. Details are sourced from the public model registry.

846,821 ↓ · 14 ♡

faster-whisper-tiny

faster-whisper-tiny is an open-source automatic-speech-recognition model available on HuggingFace. Details are sourced from the public model registry.

830,338 ↓ · 19 ♡

whisper-medium

whisper-medium is an open-source automatic-speech-recognition model available on HuggingFace. Details are sourced from the public model registry.

825,694 ↓ · 284 ♡

faster-whisper-large-v3

faster-whisper-large-v3 is an open-source automatic-speech-recognition model available on HuggingFace. Details are sourced from the public model registry.

811,408 ↓ · 569 ♡

whisper-tiny

whisper-tiny is an open-source automatic-speech-recognition model available on HuggingFace. Details are sourced from the public model registry.

802,648 ↓ · 427 ♡

parakeet-ctc-1.1b

parakeet-ctc-1.1b is an open-source automatic-speech-recognition model available on HuggingFace. Details are sourced from the public model registry.

719,583 ↓ · 46 ♡

parakeet-tdt-0.6b-v3

parakeet-tdt-0.6b-v3 is an open-source automatic-speech-recognition model available on HuggingFace. Details are sourced from the public model registry.

676,469 ↓ · 40 ♡

Wav2Vec2-large-xlsr-hindi

Wav2Vec2-large-xlsr-hindi is an open-source automatic-speech-recognition model available on HuggingFace. Details are sourced from the public model registry.

675,944 ↓ · 12 ♡

faster-whisper-base

faster-whisper-base is an open-source automatic-speech-recognition model available on HuggingFace. Details are sourced from the public model registry.

643,938 ↓ · 23 ♡

overlapped-speech-detection

overlapped-speech-detection is an open-source automatic-speech-recognition model available on HuggingFace. Details are sourced from the public model registry.

628,781 ↓ · 56 ♡

reverb-diarization-v1

reverb-diarization-v1 is an open-source automatic-speech-recognition model available on HuggingFace. Details are sourced from the public model registry.

627,408 ↓ · 13 ♡

VibeVoice-ASR

VibeVoice-ASR is an open-source automatic-speech-recognition model available on HuggingFace. Details are sourced from the public model registry.

625,932 ↓ · 1,114 ♡

wav2vec2-large-xlsr-korean

wav2vec2-large-xlsr-korean is an open-source automatic-speech-recognition model available on HuggingFace. Details are sourced from the public model registry.

605,223 ↓ · 55 ♡

wav2vec2-large-xlsr-53-arabic

wav2vec2-large-xlsr-53-arabic is an open-source automatic-speech-recognition model available on HuggingFace. Details are sourced from the public model registry.

586,689 ↓ · 54 ♡

Qwen3-ASR-0.6B

Qwen3-ASR-0.6B is an open-source automatic-speech-recognition model available on HuggingFace. Details are sourced from the public model registry.

542,556 ↓ · 285 ♡

speakerkit-pro

speakerkit-pro is an open-source automatic-speech-recognition model available on HuggingFace. Details are sourced from the public model registry.

541,062 ↓ · 20 ♡

faster-whisper-small

faster-whisper-small is an open-source automatic-speech-recognition model available on HuggingFace. Details are sourced from the public model registry.

503,535 ↓ · 31 ♡

w2v-xls-r-uk

w2v-xls-r-uk is an open-source automatic-speech-recognition model available on HuggingFace. Details are sourced from the public model registry.

494,768 ↓ · 8 ♡

wav2vec2-xlsr-53-espeak-cv-ft

wav2vec2-xlsr-53-espeak-cv-ft is an open-source automatic-speech-recognition model available on HuggingFace. Details are sourced from the public model registry.

471,866 ↓ · 49 ♡

wav2vec2-large-xlsr-53-th

wav2vec2-large-xlsr-53-th is an open-source automatic-speech-recognition model available on HuggingFace. Details are sourced from the public model registry.

450,914 ↓ · 27 ♡

Phi-4-multimodal-instruct

Phi-4-multimodal-instruct is an open-source automatic-speech-recognition model available on HuggingFace. Details are sourced from the public model registry.

427,637 ↓ · 1,597 ♡

canary-1b-flash

canary-1b-flash is an open-source automatic-speech-recognition model available on HuggingFace. Details are sourced from the public model registry.

422,833 ↓ · 271 ♡

parakeet-tdt-0.6b-v3-coreml

parakeet-tdt-0.6b-v3-coreml is an open-source automatic-speech-recognition model available on HuggingFace. Details are sourced from the public model registry.

420,190 ↓ · 41 ♡

hubert-large-ls960-ft

hubert-large-ls960-ft is an open-source automatic-speech-recognition model available on HuggingFace. Details are sourced from the public model registry.

419,417 ↓ · 76 ♡

Qwen3-ForcedAligner-0.6B

Qwen3-ForcedAligner-0.6B is an open-source automatic-speech-recognition model available on HuggingFace. Details are sourced from the public model registry.

418,055 ↓ · 125 ♡

parakeet-tdt-0.6b-v2

parakeet-tdt-0.6b-v2 is an open-source automatic-speech-recognition model available on HuggingFace. Details are sourced from the public model registry.

412,282 ↓ · 40 ♡

parakeet-tdt-0.6b-v3

parakeet-tdt-0.6b-v3 is an open-source automatic-speech-recognition model available on HuggingFace. Details are sourced from the public model registry.

402,242 ↓ · 824 ♡

wav2vec2-lv-60-espeak-cv-ft

wav2vec2-lv-60-espeak-cv-ft is an open-source automatic-speech-recognition model available on HuggingFace. Details are sourced from the public model registry.

395,202 ↓ · 67 ♡

wav2vec2-large-xlsr-53-dutch

wav2vec2-large-xlsr-53-dutch is an open-source automatic-speech-recognition model available on HuggingFace. Details are sourced from the public model registry.

380,163 ↓ · 14 ♡

parakeetkit-pro

parakeetkit-pro is an open-source automatic-speech-recognition model available on HuggingFace. Details are sourced from the public model registry.

375,705 ↓ · 4 ♡

granite-speech-3.3-2b

granite-speech-3.3-2b is an open-source automatic-speech-recognition model available on HuggingFace. Details are sourced from the public model registry.

356,666 ↓ · 54 ♡

speaker-diarization-3.0

speaker-diarization-3.0 is an open-source automatic-speech-recognition model available on HuggingFace. Details are sourced from the public model registry.

340,211 ↓ · 215 ♡

wav2vec2-indonesian-javanese-sundanese

wav2vec2-indonesian-javanese-sundanese is an open-source automatic-speech-recognition model available on HuggingFace. Details are sourced from the public model registry.

340,026 ↓ · 12 ♡

T-one

T-one is an open-source automatic-speech-recognition model available on HuggingFace. Details are sourced from the public model registry.

293,020 ↓ · 90 ♡

wav2vec2-large-xlsr-53-greek

wav2vec2-large-xlsr-53-greek is an open-source automatic-speech-recognition model available on HuggingFace. Details are sourced from the public model registry.

292,912 ↓ · 3 ♡

nb-wav2vec2-1b-nynorsk

nb-wav2vec2-1b-nynorsk is an open-source automatic-speech-recognition model available on HuggingFace. Details are sourced from the public model registry.

284,389 ↓ · 0 ♡