This report introduces Dolphin, a large-scale multilingual automatic speech recognition (ASR) model that extends the Whisper architecture to support a wider range of languages. Our approach integrates in-house proprietary and open-source datasets to refine and optimize Dolphin's performance. The model is specifically designed to achieve notable recognition accuracy for 40 Eastern languages across East Asia, South Asia, Southeast Asia, and the Middle East, while also supporting 22 Chinese dialects. Experimental evaluations show that Dolphin significantly outperforms current state-of-the-art open-source models across various languages. To promote reproducibility and community-driven innovation, we are making our trained models and inference source code publicly available.
Guess ‘Diversity’ in language wasn’t important for neither the Anglophone world, or Saltman. Good for Asia, but afaik we still lack descent support for Africa, the middle-east, and a shitload of smaller languages that Western corps didn’t bother adding.
Technically it supports fewer languages than whisper, 40 vs 99
The main problem isn’t “bother”, it’s training data. You need hundreds of thousands of hours of high quality transcripts to train models like these and that just doesn’t exist for like zulu or whatever