With the latest update, the Gemini Pro 1.5 version can now hear you.
Yes, Google has updated its most powerful Gemini Pro 1.5 AI model, it now has the ability to hear the contents of an audio or video file.
This new update of Gemini Pro 1.5 was announced at Google Next. It can now extract information from files like earning calls or audio from videos without needing a written transcript.
It means that you could give it a video or documentary and ask it questions about the clip or audio from any moment within the clip.
This new update is a push towards the multimodal approach of Google that can understand a variety of input types beyond just text.
This move for Google is possible because Gemini AI models are trained on audio, video, text, and code at the same time.
The Gemini Pro 1.5 model is available as a public preview to those with access to Vertex AI. There is no news of a public beta yet. You can use Gemini models through their chatbots.
This is not the only update Gemini is getting. Imagen 2, the text-to-image generation model from Google will also add inpainting and outpainting.