Better Than OpenAI Whisper – Google’s New Speech Recognition API

Recently, we wrote about OpenAI’s groundbreaking speech recognition tool Whisper. At the time, it has just beaten Google’s best speech recognition API out there:

But it didn’t take long for Google to catch up: πŸš€ πŸ‘‡

Say hello to the Universal Speech Model (USM), a cutting-edge language tool that understands and translates speech in over 300 languages! Created using a massive 2 billion parameters and trained on 12 million hours of speech, USM is here to help you understand everything from popular languages like English and Mandarin to lesser-known ones like Balinese, Shona, and Xhosa.

USM is perfect for use on YouTube, making it possible for people worldwide to enjoy closed captions in their own language.

But how does it work with so many languages, especially those with fewer speakers? The secret lies in using a huge dataset of different languages and fine-tuning it on smaller, labeled data. This makes USM efficient and adaptable to new languages and data.

When tested on 73 languages from YouTube captions, USM achieved an impressive word error rate (WER) of less than 30%, meaning it understands languages better than ever. In fact, it even outperformed a recently released model called Whisper, which was trained on 400,000 hours of labeled data!

USM isn’t just for YouTube, though. It also excels at tasks like speech translation, scoring higher on the BLEU metric (which measures translation quality) than Whisper. It’s a game-changer for understanding and translating speech in a wide range of languages, making communication easier and more accessible than ever before.

So, whether you want to enjoy videos in your native language or learn a new one, USM is here to make it possible. Get ready to embrace a world of languages!

If you’re a coder, you can request API access here and check out the product release page here. The research paper is available on Arxiv:

πŸ“ Google USM: Scaling Automatic Speech Recognition Beyond 100 Languages

