Introducing Gemini 1.0: Google's Most Capable AI Model

Sam Altman was named TIME’s CEO of the year

In today’s email:

  • 🧮 How AI Discovered a Faster Matrix Multiplication Algorithm

  • 📣 How to create an AI narrator for your life

  • 🧠 How AI could lead to a better understanding of the brain

  • 🧰 9 new AI-powered tools and resources. Make sure to check the online version for the full list of tools.

main-sponsor
Top News

Gemini, developed collaboratively by Google teams, is a multimodal model, able to process and combine text, code, audio, images, and video. It is designed to be flexible and efficient, running on various platforms from data centers to mobile devices. Gemini comes in three versions: Ultra, Pro, and Nano, each optimized for different scales and complexities of tasks.

Gemini Ultra has surpassed human experts in MMLU (massive multitask language understanding) and outperformed previous models in various benchmarks, including image understanding and complex reasoning. It's also highly capable in coding, with its advanced version, AlphaCode 2, outperforming in programming competitions.

Gemini's development utilized Google's Tensor Processing Units (TPUs), ensuring efficiency and scalability. With its multimodal capabilities, Gemini can analyze complex information across various formats, aiding in fields like science and finance.

Gemini is being integrated into various Google products like Bard, Pixel, Search, and others. It will be available to developers and enterprise customers through the Gemini API, with Gemini Ultra set to be released after extensive safety checks and refinements. 

Note:

Despite the excitement, the Gemini benchmarks appear to be heavily gamed. They seem to be carefully presented and use different prompting techniques to show Gemini beating GPT-4. Since no one can actually use Gemini Ultra to confirm, I think it’s safe to assume this is fluff for shareholders and that once the public gets their hands on it will be apparent that GPT-4 is still the better model and better aligned with fewer hallucinations.

Work faster than ever with ClickUp AI:

  • ⚡️ Supercharge your writing and creativity

  •  💯 Choose from 100+ use cases tailored to specific roles

  • 🔥 Generate action items from Docs and conversations ...

and so much more! 🤩

The process involves three main components:

  1. Vision Model: This model uses a computer camera to "see" and describe images. The author suggests two options:

    • Llava 13B: An open-source, cost-effective model that gives basic descriptions of images.

    • GPT-4-Vision: A more advanced and slightly more expensive model that provides detailed, nuanced descriptions.

  2. Language Model: This model writes the script for narration. Mistral 7B is used to generate a nature documentary-style script based on the image description provided by the vision model. Alternatively, GPT-4-Vision can be used to both describe the image and generate the script in one step.

  3. Text-to-Speech Model: This converts the written script into spoken audio. The author recommends ElevenLabs's voice cloning feature for high-quality output or XTTS-v2 as an open-source alternative. These tools can mimic specific voices, enhancing the narration's realism.

The author demonstrates this process using a video where an AI clone of Sir David Attenborough describes the author drinking from a cup, humorously suggesting it as part of a “mating display.” This video went viral, showcasing the potential of these AI tools.

It was a strange Thanksgiving for Sam Altman. Normally, the CEO of OpenAI flies home to St. Louis to visit family. But this time the holiday came after an existential struggle for control of a company that some believe holds the fate of humanity in its hands. Altman was weary. He went to his Napa Valley ranch for a hike, then returned to San Francisco to spend a few hours with one of the board members who had just fired and reinstated him in the span of five frantic days. He put his computer away for a few hours to cook vegetarian pasta, play loud music, and drink wine with his fiancé Oliver Mulherin. “This was a 10-out-of-10 crazy thing to live through,” Altman tells TIME on Nov. 30. “So I’m still just reeling from that.” Continue reading …

Other stuff

GPT Store now in Superpower ChatGPT. 🎉

1000s of custom GPTs right inside ChatGPT and adding more every day

Superpower ChatGPT Extension on Chrome

 

Superpower ChatGPT Extension on Firefox

 

Tools & LinkS
Editor's Pick ✨

Imagine with Meta AI - Describe an image for Meta AI to generate.

JetBrains AI - Supercharge your tools. Embrace new freedom.

Santa Cat brings AI-powered holiday magic to families, and showcases building voice-driven virtual characters

CopilotKit - Build in-app AI chatbots 🤖, and AI-powered Textareas , into React web apps.

Respeecher - Voice Cloning for Content Creators

Lume - Automate data mappings using AI

Stey.ai - Breakthrough uses AI to find The Reason Why Your User Give Up

Krater.ai - Your super-app for all things AI

Superpowered AI - API for retrieval augmented generation

Mastering ChatGPT: Unlock the Power of Prompt Templates & Supercharge Your Workflow!

Unclassified 🌀 

  • WFH Team - Work from anywhere in the world

Help share Superpower

⚡️ Be the Highlight of Someone's Day - Think a friend would enjoy this? Go ahead and forward it. They'll thank you for it!

Hope you enjoyed today's newsletter

Follow me on Twitter and Linkedin for more AI news and resources.

Did you know you can add Superpower Daily to your RSS feed https://rss.beehiiv.com/feeds/GcFiF2T4I5.xml

⚡️ Join over 200,000 people using the Superpower ChatGPT extension on Chrome and Firefox.

Superpower ChatGPT Extension on Chrome

 

Superpower ChatGPT Extension on Firefox