AI Tools.

Search

audio classification models

9 models · ranked by HuggingFace downloads

clap-htsat-fused

LAION's CLAP (Contrastive Language-Audio Pretraining) model using the HTSAT (Hierarchical Token-Semantic Audio Transformer) encoder, fused with a text encoder to align audio and text in a shared embedding space. Analogous to CLIP for images, it enables zero-shot audio classification and retrieval using natural language descriptions without task-specific labeled audio data.

18,153,697 ↓ · 82 ♡

wav2vec2-large-robust-24-ft-age-gender

wav2vec2-large-robust-24-ft-age-gender is an open-source audio-classification model available on HuggingFace. Details are sourced from the public model registry.

1,048,077 ↓ · 50 ♡

wav2vec2-large-robust-12-ft-emotion-msp-dim

wav2vec2-large-robust-12-ft-emotion-msp-dim is an open-source audio-classification model available on HuggingFace. Details are sourced from the public model registry.

1,025,791 ↓ · 159 ♡

wav2vec-vm-finetune

wav2vec-vm-finetune is an open-source audio-classification model available on HuggingFace. Details are sourced from the public model registry.

869,595 ↓ · 11 ♡

emotion-recognition-wav2vec2-IEMOCAP

emotion-recognition-wav2vec2-IEMOCAP is an open-source audio-classification model available on HuggingFace. Details are sourced from the public model registry.

607,362 ↓ · 184 ♡

music_genres_classification

music_genres_classification is an open-source audio-classification model available on HuggingFace. Details are sourced from the public model registry.

577,056 ↓ · 38 ♡

ast-finetuned-audioset-10-10-0.4593

ast-finetuned-audioset-10-10-0.4593 is an open-source audio-classification model available on HuggingFace. Details are sourced from the public model registry.

574,720 ↓ · 353 ♡

WeSpeaker-ResNet34-LM-MLX

WeSpeaker-ResNet34-LM-MLX is an open-source audio-classification model available on HuggingFace. Details are sourced from the public model registry.

325,817 ↓ · 2 ♡