Superpower Daily
Posts
GPT-4 Turbo with Vision now available through API

GPT-4 Turbo with Vision now available through API

Google releases Imagen 2, a video clip generator

Saeed Ezzati
April 09, 2024

In today’s email:

🍎 Ferret UI, Apple's new AI model could help Siri see how iOS apps work
👀 Elon Musk predicts AI will overtake human intelligence next year
🎥 New Google Vids product helps create a customized video with an AI assist
🧰 6 new AI-powered tools and resources. Make sure to check the online version for the full list of tools.

OpenAI makes GPT-4 Turbo with Vision generally available through its API

OpenAI has announced the general availability of its GPT-4 Turbo with Vision model through its API, enhancing the capabilities for enterprise developers and company leaders to integrate advanced AI features into their applications. This new model combines vision recognition and analysis, allowing for a streamlined workflow where a single API call can analyze images and apply reasoning, facilitating more efficient app development.

The integration of vision capabilities with GPT-4 Turbo allows for innovative applications across various industries. For instance, the startup Cognition is using the model to develop an autonomous AI coding agent, while Healthify leverages it for nutritional analysis based on meal photos. Another startup, TLDraw, is utilizing this technology to transform user drawings on a virtual whiteboard into functional websites, showcasing the model's versatility.

Despite facing competition from newer AI models like Anthropic's Claude 3 Opus and Google's Gemini Advanced, OpenAI's GPT-4 Turbo with Vision aims to remain a competitive choice for developers. Its ability to handle extensive data, provide speedy interactions, and offer cost-effective solutions positions OpenAI favorably as it continues to innovate in the AI space, eagerly anticipated by the tech community and industry leaders.

Google releases Imagen 2, a video clip generator

Google has introduced Imagen 2, an advanced AI tool within its Vertex AI developer platform, offering new capabilities in image generation and editing. Imagen 2, which is a suite of models, now includes features like inpainting, outpainting, and the notable addition of creating short video clips or "live images" from text prompts. This tool is designed for corporate users, enabling them to overlay text, emblems, and logos onto various media, and it's been fine-tuned to generate content focusing on subjects like nature, food, and animals.

Google's Imagen 2 aims to address past concerns with AI-generated media by incorporating a new feature, SynthID, which applies invisible, cryptographic watermarks to its outputs to combat the potential creation of deepfakes. Despite these advancements, Google's live images, which are in a preview stage, offer lower resolution and fewer customization options compared to other AI video generation tools in the market, raising questions about its competitive edge.

The training of Imagen 2, like many generative AI models, remains somewhat opaque, with Google stating it uses data primarily from public web sources. There are ongoing concerns about the ethical use of such data and whether creators can opt out or be compensated for their contributions. Furthermore, Google's new text-to-live images feature is not yet covered under its generative AI indemnification policy, leaving users to navigate potential risks associated with copyright and model regurgitation issues.

More from Google:
- New Google Vids product helps create a customized video with an AI assist
- Google launches Code Assist, its latest challenger to GitHub’s Copilot
- Google’s new chips look to challenge Nvidia, Microsoft and Amazon
- Google partners with Bayer on new AI product for radiologists
- Google and the world’s largest ad group announce landmark AI collaboration

Ferret UI, Apple's new AI model could help Siri see how iOS apps work

Apple's latest AI development, Ferret LLM, is set to revolutionize how Siri interacts with iOS apps by enhancing its understanding of iPhone displays. This innovation stems from a collaborative effort between Apple and Cornell University researchers, culminating in a paper titled "Ferret-UI: Grounded Mobile UI Understanding with Multimodal LLMs." Ferret-UI, a specialized multimodal large language model, aims to adeptly navigate and interpret the user interfaces on mobile devices, overcoming the challenges posed by the compact and intricate layouts typical of these screens.

The Ferret-UI model stands out for its ability to magnify and dissect mobile screens into manageable sections, improving its grasp of icons and text. This advanced comprehension allows Ferret-UI to respond accurately to user queries, like guiding a user to open an app or providing age-related information about an app, based on visible on-screen cues. This capability signifies a significant step forward in making digital assistants more interactive and helpful in managing tasks within apps.

Beyond enhancing user experience, Ferret-UI holds promise for aiding visually impaired individuals by offering detailed screen descriptions and executing commands within apps. This development not only showcases Apple's commitment to AI innovation but also hints at the potential for more intuitive and autonomous digital assistants in the future, capable of assisting users in a more nuanced and context-aware manner.

Elon Musk predicts AI will overtake human intelligence next year

Elon Musk predicts that artificial intelligence (AI) will surpass human intelligence by the end of next year, contingent on sufficient electricity and hardware supplies. During an interview, he projected that within five years, AI's capabilities could exceed the combined intelligence of all humans. This forecast outpaces his earlier prediction of achieving "full" artificial general intelligence (AGI) by 2029 and aligns with rapid advancement in AI, marked by new breakthroughs in chatbots and video generation tools.

Musk highlighted the current constraints in AI development, noting a shift from microchip shortages to limitations in data center equipment and electricity supply. Despite previously advocating for a halt in advanced AI development due to potential risks, Musk is now advancing his AI endeavor with xAI, aiming to surpass OpenAI's GPT-4 with upcoming models.

As part of his increased focus on AI, Musk is seeking substantial investment for xAI, aiming to position it as a competitor to OpenAI. His involvement in AI has been significant, despite a controversial departure from OpenAI in 2018. Currently, Musk is embroiled in a legal battle with OpenAI, alleging a deviation from its mission to develop AI for humanity's benefit.