Technology

Suicide Bot: New AI Attack Causes LLM to Provide Potential “Self-Harm” Instructions

Published

on

New LLM attack class, Flowbreaking, successfully caused a widely used LLM to potentially provide a researcher, masquerading as a girl, with “self harm” instructions

TEL AVIV, Israel, Nov. 26, 2024 /PRNewswire-PRWeb/ — Knostic is releasing today research on two new LLM attacks, which may constitute a new attacks class, called Flowbreaking, resulting in a widely used successful LLM providing potential instructions to our researcher, masquerading as a girl, on “self-harm”. Technologically, these attacks affect AI/ML-based system architecture for LLM applications and agents, logically similar in concept to race conditions in software vulnerabilities.

Flowbreaking can be consistently exploited to force the LLM to respond and divulge otherwise protected information before it retracts the original text, enabling attackers to exfiltrate sensitive data with a very small exfiltration footprint.

Knostic.ai is further disclosing two new attacks that fit this new class: “Second Thoughts” and “Stop and Roll”, reproduced on ChatGPT and Microsoft O365 Copilot.

A video of the “Second Thoughts” attack in action: https://www.youtube.com/watch?v=AS2kJgOgyQ4

These attacks resulted in information exposure through bypassing safety measures such as guardrails, as well as mentioned, more severe actions where a widely used successful LLM provided potential instructions to our researcher, masquerading as a girl, on the topic of self-harm, which is considered a substantial finding in AI security circles. This was discovered after we published our results, and we will follow up with more details after we responsibly disclose the issue to the provider.

Other research we mention, quoted from academia, shows these attacks resulting in revealing another user’s prompt, and buffer overflow exploitation.

Flowbreaking can be consistently exploited to force the LLM to respond and divulge otherwise protected information before it retracts the original text, enabling attackers to exfiltrate sensitive data with a very small exfiltration footprint.

Up to now, LLM attacks such as jailbreaking and prompt injection were mostly focused on directly bypassing first-line guardrails by use of “language tricks” and token level attacks, breaking the system’s policy by exploiting its reasoning limitations.

In this research we’ve used these prompting techniques as a gateway into the inner workings of the AI/ML systems. Under the auspices of this approach we try to understand the other components in the system, LLM-based or not, and to avoid them, bypass them, or use them against each other.

This expands the attack surface for security researchers studying LLMs, enabling them to make LLMs to ignore their guardrails and act beyond their intended design.

“AI/ML systems such as LLM applications and agents are more than just the model and the prompt. They have multiple components besides the model, such as guardrails, all of which can be attacked on their own, or by gaming the interplay between them,” said Gadi Evron, Co-Founder and CEO of Knostic, the world’s first provider of need-to-know based access controls for LLMs.

For example, as a result of one of these new attacks, “Second Thoughts”, when answering a sensitive question, Knostic researchers observed the LLM show signing of hesitation, having “second-thoughts” (hence the name) and retracting its answer, providing a new, redacted one.

“As LLM technologies stream answers to the user as they’re being generated, enterprises cannot safely adopt LLM applications without making sure that the answers are provided when complete, as opposed to streaming as they are formed. Further, they’d need to deploy LLM-specific access controls such as need-to-know boundaries and context-aware permissions.” Evron stated.

Evron further elaborated, “The LLM age requires a new form of identity based on the user’s need-to-know, i.e. their business context. Looking beyond security and attackers, need-to-know based controls ensure organizations can safely proceed with adoption of GenAI systems, such as Microsoft Copilot for M365 and Glean.”

Knostic Research’s findings also highlight the importance of developing new AI security mechanisms. On the offensive side we need to expand the focus of evaluations and audits beyond the model and prompts. The systems surrounding LLMs should be considered holistically instead. On the defensive side, both application security (AppSec) and model security (ModSec) should be considered critical for the secure design of AI/ML systems.

This new attack class joins Prompt Injection and Jailbreaking as an attack type, but with a consideration for the wider AI/ML system components and architecture, and significantly expands the research possibilities into LLM attacks.

You can read Knostic’s research directly on their blog, here: https://www.knostic.ai/blog/introducing-a-new-class-of-ai-attacks-flowbreaking

About Knostic.ai

Knostic.ai is the world’s first provider of need-to-know based access controls for Large Language Models (LLMs). With knowledge-centric capabilities, Knostic enables organizations to accelerate the adoption of LLMs and drive AI-powered innovation without compromising value, security, or safety. For more details, visit https://www.knostic.ai/.

For more information

Gadi Evron, CEO, Knostic

Email: press@knostic.ai.

Media Contact

Gadi Evron, Knostic, 972 50-542-8610, gadi@knostic.ai, knostic.ai

View original content to download multimedia:https://www.prweb.com/releases/suicide-bot-new-ai-attack-causes-llm-to-provide-potential-self-harm-instructions-302316660.html

SOURCE Knostic

Leave a Reply

Your email address will not be published. Required fields are marked *

Trending

Exit mobile version