Google DeepMind has announced significant advancements in its Gemini AI model family, including the introduction of Gemini 1.5 Flash. This new model is designed to be lighter and faster, catering to high-volume, high-frequency tasks with cost efficiency and a long context window. Demis Hassabis, CEO of Google DeepMind, shared these updates on behalf of the Gemini team at the recently-held Google I/O conference.
In December 2023, Google DeepMind launched the natively multimodal Gemini 1.0 in three sizes: Ultra, Pro, and Nano. Shortly after, Gemini 1.5 Pro was released, featuring enhanced performance and an unprecedented long context window of 1 million tokens. The model has been well-received by developers and enterprise customers for its robust multimodal reasoning capabilities and overall performance.
Based on user feedback indicating a need for lower latency and cost-effective solutions, Google DeepMind has now introduced Gemini 1.5 Flash. This model is optimized for efficient large-scale operations, maintaining a long context window of 1 million tokens. Both 1.5 Pro and 1.5 Flash are available in public preview via Google AI Studio and Vertex AI, with 1.5 Pro also offering a 2 million token context window for select developers and Google Cloud customers.
The Gemini family has seen substantial improvements, with the next generation of open models, Gemma 2, on the horizon. In addition, Project Astra marks significant progress toward developing advanced AI assistants. Gemini 1.5 Flash stands out as the fastest model in the Gemini family, excelling in tasks such as summarization, chat applications, image and video captioning, and data extraction. Despite being lighter than 1.5 Pro, it retains high multimodal reasoning capabilities through a distillation process from the larger model.
The improvements in 1.5 Pro include a context window extension to 2 million tokens and advancements in code generation, logical reasoning, and multi-turn conversation. It also features enhanced audio and image understanding, allowing it to follow complex instructions and perform specific use case tasks more effectively. Google DeepMind is also integrating 1.5 Pro into various Google products, including Gemini Advanced and Workspace apps. Gemini Nano is expanding to include image inputs, enabling applications to process information in a more human-like manner.
Gemma 2, the next generation of open models, boasts a new architecture for superior performance and efficiency. The Gemma family is expanding with PaliGemma, a vision-language model inspired by PaLI-3, and updates to the Responsible Generative AI Toolkit.
Project Astra showcases Google DeepMind’s vision for the future of AI assistants. These prototype agents can process information rapidly, combining video and speech inputs into a cohesive timeline and responding with natural intonations. This technology aims to create AI assistants capable of understanding and interacting with the world like humans, potentially integrating into everyday devices like phones and glasses.
Hassabis emphasized that Google DeepMind is committed to advancing AI responsibly to benefit humanity, continuously pushing the boundaries of innovation and exploring new use cases for the Gemini models.