Superpower Daily
Posts
Researchers use AI chatbots against themselves to 'jailbreak' each other

Researchers use AI chatbots against themselves to 'jailbreak' each other

New 'Mind-Reading' AI Translates Thoughts Directly From Brainwaves

Saeed Ezzati
January 03, 2024

In today’s email:

📚 Pushing ChatGPT's Structured Data Support To Its Limits
🗺️ Artificial intelligence can find your location in photos
🤯 A New Kind of AI Copy Can Fully Replicate Famous People.
🧰 8 new AI-powered tools and resources. Make sure to check the online version for the full list of tools.

Researchers use AI chatbots against themselves to 'jailbreak' each other

Researchers from Nanyang Technological University in Singapore have developed a method for compromising various AI chatbots, including ChatGPT, Google Bard, and Microsoft Bing Chat, in a process known as "jailbreaking." This involves exploiting flaws in the chatbots' software to make them produce content that goes against their developers' guidelines.

The researchers trained a large language model (LLM) on a database of prompts that had previously been successful in jailbreaking these chatbots. This new LLM is capable of automatically generating prompts to jailbreak other chatbots, exploiting their weaknesses.

LLMs, which are the core of AI chatbots, enable them to generate human-like text for various tasks. The NTU researchers' work demonstrates how these LLMs can be manipulated to produce content that is normally restricted, such as violent or unethical material.

Their method, named "Masterkey," first involves reverse-engineering how LLMs detect and defend against malicious queries. Then, they use this knowledge to teach an LLM to produce prompts that can bypass other LLMs' defenses. This process can be automated, allowing the creation of new jailbreak prompts even after developers update their LLMs' security.

This work has been accepted for presentation at a major security forum and highlights the vulnerabilities in AI chatbots. The researchers also propose using their method to help developers strengthen their LLMs against such attacks.

Your SOC 2 Compliance Checklist from Vanta

Are you building a business? Achieving SOC 2 compliance can help you win bigger deals, enter new markets, and deepen trust with your customers — but it can also cost you real time and money.

Vanta automates up to 90% of the work for SOC 2 (along with other in-demand frameworks), getting you audit-ready in weeks instead of months. Save up to 400 hours and 85% of associated costs.

Download the free checklist to learn more about the SOC 2 compliance process and the road ahead.

New 'Mind-Reading' AI Translates Thoughts Directly From Brainwaves – Without Implants

Researchers from Nanyang Technological University, Singapore (NTU Singapore), have developed a method to "jailbreak" AI chatbots, including ChatGPT, Google Bard, and Microsoft Bing Chat. This jailbreaking involves exploiting flaws in the chatbots' software to make them produce content against their developers' guidelines. The term "jailbreaking" in computer security refers to hackers finding and exploiting system vulnerabilities to bypass restrictions.

The team trained a large language model (LLM) on a database of successful jailbreak prompts, creating an AI capable of generating new prompts to jailbreak other chatbots. LLMs, which power AI chatbots, can process human inputs and generate human-like text. The NTU research adds jailbreaking to the capabilities of LLMs, highlighting the weaknesses and limitations of current AI chatbots, thereby urging developers to strengthen their defenses.

The NTU team's approach, named "Masterkey," involves reverse-engineering LLMs to understand their defense mechanisms and then training an LLM to bypass these defenses. This method proved to be three times more effective than previous methods, able to adapt and generate new prompts even after developers patch their systems.

The research aims to show the vulnerabilities in AI chatbots, making them susceptible to producing unethical or criminal content. The findings, presented at a leading security forum, could help developers fortify their AI against such attacks. This escalating arms race between hackers and developers underscores the ongoing challenge in securing AI systems against misuse.

Midjourney Leaps into AI Video Creation

Midjourney, known for its image generation tool within a Discord server, is expanding into video generation. The CEO, David Holz, announced plans to train video models starting in January, aiming for a release in the coming months. This marks a significant step for Midjourney, evolving from a mature image model to entering the competitive generative video industry. They also plan to refine their manga/anime generator model, V6 Niji, and make consistency fixes for the official release of Midjourney V6.

Midjourney has typically prioritized quality and user experience over speed, introducing features like inpainting and outpainting later than competitors. This approach contrasts with other platforms like Stable Diffusion, Dall-E 3, SDXL, Ideogram, and IF, which have already ventured into text generation and other advanced features.

The move into video generation follows similar advancements by competitors like Stability AI's Stable Video Diffusion, Meta's EMU video generator, Pika, Runway ML, and Leonardo AI. With its recent v6 update, Midjourney aims to stay competitive in the rapidly evolving AI landscape, emphasizing improved prompt adherence and realistic image generation. This venture into AI-generated video content holds significant implications for the creative and media industries, potentially revolutionizing how we produce, manipulate, and perceive video content.