OpenAI unveils SearchGPT

‘Model collapse’: Scientists warn against letting AI eat its own tail

In today’s email:

  • 🧮 Google's DeepMind Says Its AI Can Tackle Math Olympiad Problems

  • 🔥 Stability AI steps into a new gen-AI dimension with Stable Video 4D

  • 📊 Microsoft’s AI Assistants Will Revolutionize the Office — One Day

  • 🧰 10 new AI-powered tools and resources. Make sure to check the online version for the full list of tools.

Top News

OpenAI has announced the launch of SearchGPT, an AI-powered search engine designed to deliver real-time information from the web. Unlike traditional search engines that provide a list of links, SearchGPT organizes and summarizes information, offering detailed descriptions and source attributions. This prototype, available to 10,000 test users, is built on the GPT-4 model and aims to integrate search features directly into ChatGPT. Features like “visual answers” and follow-up question capabilities enhance the user experience, though some specifics remain undisclosed.

OpenAI is positioning SearchGPT as a competitor to Google and Perplexity by collaborating with news partners such as The Wall Street Journal and Vox Media to ensure content is accurately attributed and linked. This approach aims to avoid the pitfalls Perplexity faced, such as accusations of content plagiarism. Publishers can control their content’s appearance in SearchGPT and opt out of using their data for training while still being featured in search results. This collaboration emphasizes transparent and ethical AI use in news dissemination.

Launching SearchGPT as a prototype allows OpenAI to address potential inaccuracies and refine its service based on user feedback. This gradual rollout also helps manage the high operational costs associated with AI development and maintenance. With the prototype being free and ad-free initially, OpenAI faces the challenge of developing a sustainable monetization strategy as it moves towards full integration and broader release.

Recent research indicates that relying on computer-generated synthetic data to train artificial intelligence (AI) models could lead to nonsensical results and rapid degradation of the models. This issue, termed “model collapse,” arises when AI models are trained on data created by other AI systems, which amplifies mistakes over successive generations. Notable AI companies, including OpenAI and Microsoft, have explored using synthetic data due to the scarcity of human-made material. However, the research published in Nature reveals that the use of synthetic data can quickly lead to severe errors and a loss of data variance, ultimately causing the models to produce irrelevant or incorrect outputs.

The study highlights that the deterioration speed is influenced by the design flaws, learning processes, and data quality used in training. For instance, one experiment involving synthetic input text about medieval architecture devolved into unrelated discussions about jackrabbits within fewer than ten generations. The researchers observed that recursively trained language models often started repeating phrases and losing their ability to generate diverse and accurate information. This tendency is further aggravated when AI models are trained on their outputs, leading to a progressive over-representation of majority subpopulations at the expense of minority groups.

Efforts to mitigate model collapse have proven challenging. Techniques like embedding watermarks to flag AI-generated content for exclusion from training datasets require cooperation between tech companies, which may not always be feasible. The research underscores the urgency for AI developers to secure human-generated data and raises concerns about the future as these finite sources dwindle. The findings suggest that companies who initially gathered training data from the pre-AI internet may have an advantage in developing generative AI models that better reflect the real world.

The AlphaProof and AlphaGeometry 2 AI systems have made a significant breakthrough in mathematical problem-solving by achieving a silver medal standard at this year's International Mathematical Olympiad (IMO). AlphaProof, employing a reinforcement learning-based system, solved advanced algebra and number theory problems, while AlphaGeometry 2 tackled geometry challenges. Notably, AlphaProof tackled the competition’s hardest problem, which only five contestants solved, using a combination of a pre-trained language model and the AlphaZero reinforcement learning algorithm to generate and verify proofs in the formal language Lean.

Both AI systems were part of an extensive training regime involving millions of problem-solving sessions to refine their capabilities. AlphaProof worked by translating informal math problems into formal language, then searching for proofs or disproofs. AlphaGeometry 2, enhanced with a more efficient symbolic engine and a vast amount of synthetic data, improved significantly over its predecessor, solving 83% of historical IMO geometry problems from the past 25 years.

These developments underscore the potential of AI to assist in mathematical reasoning and problem-solving, offering new tools for mathematicians to explore and solve complex problems. The integration of AI in mathematics, as demonstrated by AlphaProof and AlphaGeometry 2, hints at future advancements where AI can assist in hypothesizing and solving long-standing mathematical problems more efficiently.

Mistral has launched its new flagship AI model, Large 2, which aims to compete with the latest models from OpenAI and Meta. With 123 billion parameters, Large 2 outperforms Meta’s Llama 3.1 405B in code generation and mathematical tasks, while using less than a third of the parameters. One of the key improvements in Large 2 is its reduced hallucination rate, allowing it to acknowledge when it doesn’t know something instead of fabricating information. This new model also offers enhanced multilingual support and can handle up to 128,000 tokens in a single prompt, equivalent to about a 300-page book.

Mistral, a Paris-based AI startup, recently secured $640 million in Series B funding, achieving a $6 billion valuation. Despite being a newer player in the AI field, Mistral is quickly establishing itself as a significant contender by releasing models that are at the forefront of technology. However, like many advanced AI models, Mistral’s Large 2 requires a paid license for commercial use, making it less accessible to those without the necessary expertise and infrastructure. Notably, both Mistral’s Large 2 and Meta’s Llama 3.1 lack multimodal capabilities, an area where OpenAI remains ahead.

Large 2 is now available on several platforms, including Google Vertex AI, Amazon Bedrock, Azure AI Studio, and IBM watsonx.ai. It can also be tested for free on Mistral’s ChatGPT competitor, le Chat, under the name “mistral-large-2407”. Mistral’s commitment to advancing AI is evident in its rapid development and release of models that push the boundaries of performance and cost-effectiveness for open models, further solidifying its place in the competitive AI landscape.

Other stuff

All your ChatGPT images in one place 🎉

You can now search for images, see their prompts, and download all images in one place.

Tools & LinkS
Editor's Pick ✨

Cerebrium - The Future of Education

QuizRise - AI assistant for educators and learners

Hey AI - AI to AI universe for dating, build AI cupid for everyone

Hemingway Editor Plus - Make your writing clear and concise with AI

Speech to Note - Turn your voice into written notes

Mermaid AI - Maximize your diagramming efficiency with Mermaid AI

Free AI Image Extender - Extend your images with AI like Photoshop

AI Manga Translator - Precise online Manga translation

Mureka - Audio-prompted version of Suno

AI-generated diagram - Use AI to generate diagrams or graphs or process

Unclassified 🌀 

How did you like today’s newsletter?

Login or Subscribe to participate in polls.

Help share Superpower

⚡️ Be the Highlight of Someone's Day - Think a friend would enjoy this? Go ahead and forward it. They'll thank you for it!

Hope you enjoyed today's newsletter

Follow me on Twitter and Linkedin for more AI news and resources.

Did you know you can add Superpower Daily to your RSS feed https://rss.beehiiv.com/feeds/GcFiF2T4I5.xml

⚡️ Join over 200,000 people using the Superpower ChatGPT extension on Chrome and Firefox.

OR