- Superpower Daily
- Posts
- Google I/O 2024 - it’s AI time
Google I/O 2024 - it’s AI time
Ilya and OpenAI are going to part ways
In today’s email:
🤳🏻 Google announced Project Astra: An AI agent that can see AND hear what you do live in real-time.
🎥 Google Veo, a serious swing at AI-generated video
👀 Why is it so dangerous for AI to learn how to lie?
🧰 12 new AI-powered tools and resources. Make sure to check the online version for the full list of tools.
At the I/O 2024 event, Google introduced Project Astra, a new AI application designed to enhance everyday life. This app uses your phone's camera and AI to identify and recall objects, sounds, and more. Demonstrations showed it recognizing a speaker, explaining its parts, and even creating alliterations on demand. Astra can remember items seen previously, as showcased when it located glasses out of the camera's view. It also provides helpful suggestions, like improving a system's speed by adding a cache, and can identify various objects and offer creative responses.
The video presentation hinted at the return of Google Glass, with the tester using wearable glasses to interact with Astra. These glasses scanned surroundings and provided real-time information and suggestions. The AI's quick response time and expressive vocal capabilities were highlighted, with Google's DeepMind CEO Demis Hassabis emphasizing their goal of creating universal AI agents that can process and recall information efficiently. This progress marks a significant step in multimodal information understanding, although challenges remain in making these interactions conversational.
Currently, Project Astra is still in its early stages with no official launch date. However, Hassabis mentioned that some capabilities will be available in the Gemini app later this year. The integration of such advanced AI features into everyday devices promises a future where AI assistants become even more helpful and intuitive, potentially through phones or smart glasses.
OpenAI co-founder Ilya Sutskever announced his departure from the Microsoft-backed startup on Tuesday, expressing enthusiasm for a new personal project. This decision follows a significant leadership upheaval at OpenAI, where CEO Sam Altman was briefly ousted due to alleged communication issues with the board. The conflict highlighted differing priorities within the company, with Sutskever focusing on AI safety while Altman pushed for rapid technological advancement. Following Altman’s reinstatement, several board members, including Sutskever, left, and new board members such as Bret Taylor and Larry Summers were appointed.
Altman expressed sadness over Sutskever’s departure, praising him as a brilliant mind and dear friend. Jakub Pachocki, a long-time research director at OpenAI, will take over as chief scientist. Alongside Sutskever, Jan Leike, co-leader of the Superalignment team dedicated to AI safety, also left the company. OpenAI continues to evolve, recently launching a new AI model, GPT-4o, which is faster and more capable in various media formats. The company also plans to introduce video chat functionality for ChatGPT, enhancing its user interface and accessibility.
OpenAI has been through a period of transformation, both in leadership and technological development. The board reshuffle included notable additions like former Salesforce co-CEO Bret Taylor and ex-Treasury Secretary Larry Summers, aiming to steer the company forward. Amid these changes, OpenAI launched new advancements, including a desktop version of ChatGPT and an upgrade to GPT-4, reflecting its ongoing commitment to expanding AI capabilities.
During the Google I/O 2024 developer conference, Google announced the private preview of Gemini 1.5 Pro, their latest generative AI model capable of analyzing up to 2 million tokens. This new capacity, double the previous limit, allows Gemini 1.5 Pro to handle the largest input of any commercially available model, surpassing Anthropic’s Claude 3. The enhanced token capacity enables the model to process around 1.4 million words, two hours of video, or 22 hours of audio, thereby improving performance in tasks such as code generation, logical reasoning, and multi-turn conversation.
Additionally, Google introduced Gemini 1.5 Flash, a streamlined version designed for less demanding, high-frequency generative AI workloads. Despite being more efficient, Flash retains the 2-million-token context window and is suited for tasks like summarization, image and video captioning, and data extraction from long documents. Both models are multimodal, capable of analyzing audio, video, and images in addition to text. Flash aims to offer speed and efficiency, making it ideal for applications where rapid output is crucial.
Google also unveiled new features to enhance the cost-effectiveness and utility of their AI models. Context caching allows developers to store large amounts of information for quick and affordable access by Gemini models. The Batch API, now in public preview on Vertex AI, enables handling multiple prompts in a single request, further optimizing costs. Additionally, controlled generation, set to arrive later in the month, will allow users to define specific formats for model outputs, increasing the utility and affordability of long-context applications.
At Google I/O 2024, Google introduced Veo, an advanced AI model capable of generating 1080p video clips up to a minute long from text prompts. Veo, which is seen as a competitor to OpenAI's Sora, can produce various visual styles such as landscapes and time lapses and can edit existing footage. Demis Hassabis from DeepMind highlighted ongoing explorations in features like storyboarding and generating longer scenes, showcasing Veo's capabilities. Unlike its predecessor based on Imagen 2, which created only low-resolution, short videos, Veo demonstrates significant advancements in video generation quality and complexity.
Veo was trained on extensive video footage, potentially including content from YouTube, though Google has not disclosed specifics. The model's training leverages vast datasets to identify patterns, enabling the creation of new videos. While Google claims to adhere to ethical standards, concerns about data usage and the potential for exact replicas of training data to emerge persist. Despite these issues, Veo has already been made available to select creators like Donald Glover, and Google is positioning it as a powerful tool for the creative industry.
Douglas Eck from DeepMind emphasized Veo's technical strengths, including its ability to understand camera movements and visual effects from prompts, as well as physics elements like fluid dynamics and gravity. The model supports masked editing and can generate videos from still images. However, Veo is not without flaws, such as occasional inconsistencies in object behavior and physics errors. Currently, Veo is accessible via a waitlist on Google Labs, with plans to integrate its capabilities into YouTube Shorts and other products as it continues to improve.
Other announcements by Google:
- Google mentioned ‘AI’ 120+ times during its I/O keynote
- Google is redesigning its search engine — and it’s AI all the way down
- Google is building its Gemini Nano AI model into Chrome on the desktop
- Gemini comes to Gmail to summarize, draft emails, and more
- Circle to Search is now a better homework helper
Other stuff
Why it is so dangerous for AI to learn how to lie: ‘It will deceive us like the rich’
Tesla loses a top AI lead
AI Engineer Compensation Trends Q1 2024
ChatGPT consumes 25 times more energy than Google
AI that’s smarter than humans? Americans say a firm “no thank you.”
AI is the reason interviews are harder now
Writers and publishers in Singapore reject a government plan to train AI on their work
ChatGPT's new voice welcomes interruptions
Superpower ChatGPT now supports voice 🎉
Text-to-Speech and Speech-to-Text. Easily have a conversation with ChatGPT on your computer
Glue is a smarter work chat for people, apps, and AI.
Human-Level Translation, for Pennies, in Minutes.
Work HQ - The AI-first replacement for Salesforce and LinkedIn Recruiter
fynk - AI contract management
Wegic - The first AI web designer & developer by your side
AIWatchfulCompanion - AI caregiver to assist in caring for your loved ones
SaveDay - Instant summarizer app
Lune AI - Learn a new language with an AI tutor in your pocket
stoic. - The all-in-one journaling app, now with AI reflections
Canny Autopilot - Put feedback management on Autopilot with powerful AI tools
CustomerIQ - Turn feedback into revenue with AI
Julep - Open-source backend for building stateful AI apps
Unclassified 🌀
WFH Team - Work from anywhere in the world
How did you like today’s newsletter? |
Help share Superpower
⚡️ Be the Highlight of Someone's Day - Think a friend would enjoy this? Go ahead and forward it. They'll thank you for it!
Hope you enjoyed today's newsletter
Did you know you can add Superpower Daily to your RSS feed https://rss.beehiiv.com/feeds/GcFiF2T4I5.xml
⚡️ Join over 200,000 people using the Superpower ChatGPT extension on Chrome and Firefox.