Google is hyping up its new Gemini 2.0 models. The first model, Gemini 2.0 Flash, is already live, and comes with new AI agent experiences like Project Astra and Project Mariner. Google is ending 2024 with a bang. On Wednesday, the Mountain View giant announced a slew of AI news, including the release of Gemini 2.0, a new language model with advanced multimodal capabilities. The new model kicks off what Google calls the “agency era,” where virtual AI agents will be able to perform tasks on your behalf.
Initially, Google released just one model in the Gemini 2.0 family: Gemini 2.0 Flash Experimental, a super-fast, lightweight model that supports multimodal input and output. It can natively generate images mixed with text and multilingual audio, and can be seamlessly integrated into Google Search, code execution, and other tools. These features are currently in preview for developers and beta testers. Despite being smaller, 2.0 Flash outperforms Gemini 1.5 Pro in multiple areas, including factuality, reasoning, coding, math, and is also twice as fast. Regular users can try out the chat-optimized version of Gemini 2.0 Flash on the web starting today, and it will soon appear in the Gemini mobile app.
Google also showed off several impressive experiences built with Gemini 2.0. The first is an updated version of Project Astra, the experimental virtual AI agent that Google first showed off in May 2024. With Gemini 2.0, it can now hold conversations in multiple languages; use tools like Google Search, Lens, and Maps; remember the content of past conversations, and understand language with the latency of human conversation. Project Astra is designed to run on smartphones and glasses, but it is currently limited to a small group of trusted testers. People interested in trying out the prototype on an Android phone can join the waitlist here. There’s also this really cool multimodal real-time API demo, which is a bit like Project Astra, allowing you to interact with a chatbot in real time via video, voice, and screen sharing.
Next up is Project Mariner, an experimental Chrome browser extension that browses the internet and performs tasks for you. Available now to select testers in the US, the extension leverages Gemini 2.0’s multimodal capabilities to “understand and reason about information on the browser screen, including pixels and web elements like text, code, images, and forms.” Google admits the technology is still in its infancy and isn’t always reliable. But even in its current prototype form, it’s impressive, as you can see for yourself in the YouTube demo.
Google also announced Jules, an AI-powered code agent powered by Gemini 2.0. It integrates directly into your GitHub workflow, and the company says it can handle bug fixes and repetitive, time-consuming tasks “while you focus on what you actually want to build.”
For now, many of the new announcements are limited to early testers and app developers. Google says it plans to integrate Gemini 2.0 into its portfolio of products, including Search, Workspaces, Maps, and more, early next year. By then, we’ll have a better idea of how these new multimodal features and improvements translate to real-world use cases. No word yet on Gemini 2.0 Ultra and Pro models.