Boston Dynamics’ Atlas humanoid robot goes electric

Ultra-realistic AI video from a single photo

In today’s email:

  • 📈 NVIDIA New AI Models Set Records for Speed and Accuracy

  • 🤑 Logitech mouse and keyboard users are getting a free AI upgrade

  • 👀 OpenAI Introduced a series of updates to the Assistants API

  • 🧰 9 new AI-powered tools and resources. Make sure to check the online version for the full list of tools.

Top News

Boston Dynamics has recently unveiled an all-electric version of its Atlas humanoid robot, just a day after retiring its hydraulic predecessor. This new iteration of Atlas showcases a more fluid, albeit still slightly jerky, movement compared to earlier models and marks a significant design shift towards a sleeker, more cartoonish appearance, resembling robots like Agility’s Digit and Apptronik’s Apollo. The robot's enhanced mobility and capabilities, including a unique head design with a round display and simpler, three-fingered hands, reflect a departure from strictly human-like features to improve functionality and interaction.

Boston Dynamics' CEO Robert Playter emphasized the commercial and industrial potential of the new Atlas, noting its pilot testing at Hyundai facilities slated for next year with broader production goals further out. The robot's design focuses on practicality in industrial settings, boasting custom actuators at joints for greater range of motion and robustness. The new Atlas is designed to perform complex tasks with enhanced agility, like quick turning and lifting heavy objects, which are essential for integrating into existing workflows without needing major spatial redesigns.

This shift to electric and the focus on more adaptable and resilient designs highlight Boston Dynamics’ commitment to advancing robotic technology in practical applications, moving beyond the viral spectacle of robot capabilities to real-world functionality. The company aims to maintain the Atlas name for commercialization, signaling a strategic decision to leverage its established brand while progressing toward robots that are both practical and capable of complex tasks, paving the way for more advanced implementations in various industries.

Beam's Dream Powder is a healthy hot chocolate that's clinically shown to improve sleep. It comes in a variety of delicious flavors, like Brownie Batter and Sea Salt Caramel, and has only 15 calories and zero added sugar. It helps you fall asleep, stay asleep, and wake up refreshed and non-groggy and is a nice sweet treat before bed.

A new research paper published by Microsoft introduces "VASA," a novel framework designed to create realistic talking face videos from a single photo and speech audio. VASA-1, their premier model, not only achieves precise lip synchronization with the audio but also captures a wide range of facial expressions and natural head movements, enhancing the realism and liveliness of the generated avatars. This technology operates within a new facial dynamics model and leverages a sophisticated face latent space to produce high-quality videos in real-time, capable of running at up to 40 frames per second with minimal latency, thanks to its efficient use of cutting-edge hardware like the NVIDIA RTX 4090 GPU.

The versatility of VASA is demonstrated through its ability to handle various inputs and conditions. It effectively processes out-of-distribution inputs like artistic images and non-English audio, proving its robustness. Additionally, it offers extensive control over the generated content, allowing users to adjust factors such as gaze direction, head distance, and emotional expression to suit specific requirements. These features underscore its potential for creating highly personalized and interactive AI-driven content.

Addressing potential ethical concerns, the creators of VASA emphasize that their technology is intended for positive uses, such as enhancing educational tools, aiding communication, and providing therapeutic support. Despite the risks associated with possible misuse for impersonating real individuals, the team is committed to responsible AI development. They aim to contribute to forgery detection advancements and ensure the technology enhances human well-being, reflecting a balanced approach to innovation and ethical responsibility in AI research.

NVIDIA's recent advancements in artificial intelligence for speech and translation have set new industry standards in both speed and accuracy. The NVIDIA Parakeet and Canary models are leading the charts on the Hugging Face Open ASR Leaderboard, with Parakeet models offering a variety of configurations tailored to different user needs, boasting rapid transcription speeds and robust noise handling capabilities. On the other hand, the Canary model excels in multilingual speech recognition and translation, supporting languages like English, German, French, and Spanish with remarkable accuracy and efficiency.

The Parakeet-TDT model, a standout within the Parakeet family, delivers the highest accuracy and operates substantially faster than its counterparts by efficiently predicting both tokens and their durations, reducing unnecessary computations. This model is optimized for quick and accurate speech recognition, showcasing NVIDIA's commitment to pushing the boundaries of AI technology. Meanwhile, the Canary model integrates advanced techniques in its encoder-decoder structure, achieving significant compute and memory efficiencies, and facilitating seamless transitions between transcription and translation tasks.

NVIDIA has also pioneered in custom voice creation with its P-Flow model, which won the LIMMITS '24 challenge. P-Flow can generate high-quality personalized speech in real-time and extend its zero-shot TTS capabilities to multiple languages using minimal voice prompts. This technology not only enhances the naturalness and likeness of synthesized speech but also broadens the scope for multilingual applications. As NVIDIA continues to refine these technologies, their potential applications in various sectors are expansive, setting new benchmarks for what AI can achieve in natural language processing and beyond.

Other stuff

Superpower ChatGPT now supports voice 🎉

Text-to-Speech and Speech-to-Text. Easily have a conversation with ChatGPT on your computer

Superpower ChatGPT Extension on Chrome

 

Superpower ChatGPT Extension on Firefox

 

Tools & LinkS
Editor's Pick ✨

Zoom Workplace - Reimagine teamwork in an AI-powered collaboration platform

RikiGPT - An academic AI writer on the internet

LLM EXPLORER - What open-source LLMs or SLMs are you in search of

Langalf - Agentic LLM Vulnerability Scanner

Termax is an LLM agent in your terminal that converts natural language to commands, enhanced by RAG

SpeedLegal - Your personal AI contract negotiator

Collato AI Notetaker - A smart notetaker for busy people

Superwhisper for iOS - Extremely accurate, AI-powered voice-to-text

CompliantChatGPT - ChatGPT, but HIPAA compliant

Superpower ChatGPT with Voice: Seamless Text-to-Speech and Speech-to-Text Experience!

Unclassified 🌀 

  • WFH Team - Work from anywhere in the world

How did you like today’s newsletter?

Login or Subscribe to participate in polls.

Help share Superpower

⚡️ Be the Highlight of Someone's Day - Think a friend would enjoy this? Go ahead and forward it. They'll thank you for it!

Hope you enjoyed today's newsletter

Follow me on Twitter and Linkedin for more AI news and resources.

Did you know you can add Superpower Daily to your RSS feed https://rss.beehiiv.com/feeds/GcFiF2T4I5.xml

⚡️ Join over 200,000 people using the Superpower ChatGPT extension on Chrome and Firefox.

Superpower ChatGPT Extension on Chrome

 

Superpower ChatGPT Extension on Firefox