Audio Models
Speech recognition, text-to-speech, and audio processing models
12 models available
Whisper Large V3
1.55B9.2OpenAIState-of-the-art speech recognition model supporting 99 languages with exceptional accuracy.
Whisper Medium
769M8.5OpenAIBalanced speech recognition model offering good accuracy with reasonable resource usage.
Whisper Small
244M7.8OpenAICompact speech recognition for edge deployment and real-time applications.
Bark
1B8.1Suno AIText-to-audio model generating speech, music, and sound effects.
MusicGen
1.5B8MetaControllable music generation model creating high-quality audio from text.
AudioCraft
1.5B8.2MetaSuite of audio generation models for music, sound effects, and compression.
Seamless M4T
2.3B8.7MetaMassively multilingual and multimodal translation model.
Whisper Tiny
39M7.2OpenAIUltra-compact speech recognition for extreme edge deployment.
Whisper Base
74M7.5OpenAIBalanced speech recognition model for general use.
Tortoise TTS
1B8.4Tortoise TeamHigh-quality text-to-speech with voice cloning capabilities.
Coqui XTTS
500M8.2CoquiMultilingual text-to-speech with voice cloning in 17 languages.
Riffusion
0.98B7.7RiffusionStable Diffusion fine-tune for music generation via spectrograms.