ElevanLabs Launches Scribe for Enhanced AI Transcription

This startup of innovation recently scooped a whopping amount of money in funding to the tune of $180 million and is now ranked among the best in audio generation. The company, however, moves in a new direction with the launching of its first stand-alone speech-to-text model named Scribe.

The new stand-alone speech-to-text model amazes several corporate firms as ElevenLabs specializes in providing text-to-speech solutions across its extensive library of voices- mostly 3.3 billion. It focused on extending the capability into speech detection and becoming a rival to established competitors like Gladia, Speechmatics, AssemblyAI, Deepgram, and OpenAI’s Whisper models.

Scribe comes fully with support for more than 99 languages at the start. The company underscores that more than 25 of these achieve superb accuracies with word error rates below 5%.

Some of the languages that are included in this impressive list are English, French, German, Hindi, Indonesian, Japanese, Kannada, Malayalam, Polish, Portuguese, Spanish, and Vietnamese.

The remaining languages are further categorized on the basis of accuracy levels as high, good, and moderate word error rates. In benchmark tests such as FLEURS and Common Voice, ElevenLabs claims that Scribe actually outperforms both Google Gemini 2.0 Flash and Whisper Large V3 in terms of multiple languages.

Driving the reform was a recent chat about the importance of making it into CEO Mati Staniszewski said, “Overall, detection needs to be improved. We want to understand what’s being said to us in a conversation better. We are working on ways to move away from only generating content and understanding and transcribing speech.”

Staniszewski further added, “Many people say that speech-to-text is a solved problem. But for many languages, it is pretty bad. We think we can build better speech detection models because we have in-house teams to annotate data and give us quick feedback.”

Aimee Pearcy

Tech Journalist

ElevenLabs Enters Speech-to-Text Arena With New Model