Superpower Daily
Posts
Google I/O 2024 - it’s AI time

Google I/O 2024 - it’s AI time

Ilya and OpenAI are going to part ways

Saeed Ezzati
May 15, 2024

In today’s email:

🤳🏻 Google announced Project Astra: An AI agent that can see AND hear what you do live in real-time.
🎥 Google Veo, a serious swing at AI-generated video
👀 Why is it so dangerous for AI to learn how to lie?
🧰 12 new AI-powered tools and resources. Make sure to check the online version for the full list of tools.

Google announced Project Astra: An AI agent that can see AND hear what you do live in real-time.

At the I/O 2024 event, Google introduced Project Astra, a new AI application designed to enhance everyday life. This app uses your phone's camera and AI to identify and recall objects, sounds, and more. Demonstrations showed it recognizing a speaker, explaining its parts, and even creating alliterations on demand. Astra can remember items seen previously, as showcased when it located glasses out of the camera's view. It also provides helpful suggestions, like improving a system's speed by adding a cache, and can identify various objects and offer creative responses.

The video presentation hinted at the return of Google Glass, with the tester using wearable glasses to interact with Astra. These glasses scanned surroundings and provided real-time information and suggestions. The AI's quick response time and expressive vocal capabilities were highlighted, with Google's DeepMind CEO Demis Hassabis emphasizing their goal of creating universal AI agents that can process and recall information efficiently. This progress marks a significant step in multimodal information understanding, although challenges remain in making these interactions conversational.

Currently, Project Astra is still in its early stages with no official launch date. However, Hassabis mentioned that some capabilities will be available in the Gemini app later this year. The integration of such advanced AI features into everyday devices promises a future where AI assistants become even more helpful and intuitive, potentially through phones or smart glasses.

Ilya and OpenAI are going to part ways

OpenAI co-founder Ilya Sutskever announced his departure from the Microsoft-backed startup on Tuesday, expressing enthusiasm for a new personal project. This decision follows a significant leadership upheaval at OpenAI, where CEO Sam Altman was briefly ousted due to alleged communication issues with the board. The conflict highlighted differing priorities within the company, with Sutskever focusing on AI safety while Altman pushed for rapid technological advancement. Following Altman’s reinstatement, several board members, including Sutskever, left, and new board members such as Bret Taylor and Larry Summers were appointed.

Altman expressed sadness over Sutskever’s departure, praising him as a brilliant mind and dear friend. Jakub Pachocki, a long-time research director at OpenAI, will take over as chief scientist. Alongside Sutskever, Jan Leike, co-leader of the Superalignment team dedicated to AI safety, also left the company. OpenAI continues to evolve, recently launching a new AI model, GPT-4o, which is faster and more capable in various media formats. The company also plans to introduce video chat functionality for ChatGPT, enhancing its user interface and accessibility.

OpenAI has been through a period of transformation, both in leadership and technological development. The board reshuffle included notable additions like former Salesforce co-CEO Bret Taylor and ex-Treasury Secretary Larry Summers, aiming to steer the company forward. Amid these changes, OpenAI launched new advancements, including a desktop version of ChatGPT and an upgrade to GPT-4, reflecting its ongoing commitment to expanding AI capabilities.

If 1 million tokens is a lot, how about 2 million?

During the Google I/O 2024 developer conference, Google announced the private preview of Gemini 1.5 Pro, their latest generative AI model capable of analyzing up to 2 million tokens. This new capacity, double the previous limit, allows Gemini 1.5 Pro to handle the largest input of any commercially available model, surpassing Anthropic’s Claude 3. The enhanced token capacity enables the model to process around 1.4 million words, two hours of video, or 22 hours of audio, thereby improving performance in tasks such as code generation, logical reasoning, and multi-turn conversation.

Additionally, Google introduced Gemini 1.5 Flash, a streamlined version designed for less demanding, high-frequency generative AI workloads. Despite being more efficient, Flash retains the 2-million-token context window and is suited for tasks like summarization, image and video captioning, and data extraction from long documents. Both models are multimodal, capable of analyzing audio, video, and images in addition to text. Flash aims to offer speed and efficiency, making it ideal for applications where rapid output is crucial.

Google also unveiled new features to enhance the cost-effectiveness and utility of their AI models. Context caching allows developers to store large amounts of information for quick and affordable access by Gemini models. The Batch API, now in public preview on Vertex AI, enables handling multiple prompts in a single request, further optimizing costs. Additionally, controlled generation, set to arrive later in the month, will allow users to define specific formats for model outputs, increasing the utility and affordability of long-context applications.

Google Veo, a serious swing at AI-generated video

At Google I/O 2024, Google introduced Veo, an advanced AI model capable of generating 1080p video clips up to a minute long from text prompts. Veo, which is seen as a competitor to OpenAI's Sora, can produce various visual styles such as landscapes and time lapses and can edit existing footage. Demis Hassabis from DeepMind highlighted ongoing explorations in features like storyboarding and generating longer scenes, showcasing Veo's capabilities. Unlike its predecessor based on Imagen 2, which created only low-resolution, short videos, Veo demonstrates significant advancements in video generation quality and complexity.

Veo was trained on extensive video footage, potentially including content from YouTube, though Google has not disclosed specifics. The model's training leverages vast datasets to identify patterns, enabling the creation of new videos. While Google claims to adhere to ethical standards, concerns about data usage and the potential for exact replicas of training data to emerge persist. Despite these issues, Veo has already been made available to select creators like Donald Glover, and Google is positioning it as a powerful tool for the creative industry.

Douglas Eck from DeepMind emphasized Veo's technical strengths, including its ability to understand camera movements and visual effects from prompts, as well as physics elements like fluid dynamics and gravity. The model supports masked editing and can generate videos from still images. However, Veo is not without flaws, such as occasional inconsistencies in object behavior and physics errors. Currently, Veo is accessible via a waitlist on Google Labs, with plans to integrate its capabilities into YouTube Shorts and other products as it continues to improve.

Other announcements by Google:
- Google mentioned ‘AI’ 120+ times during its I/O keynote
- Google is redesigning its search engine — and it’s AI all the way down
- Google is building its Gemini Nano AI model into Chrome on the desktop
- Gemini comes to Gmail to summarize, draft emails, and more
- Circle to Search is now a better homework helper