tlmfoundationcosmetics.com

Bypassing the Prompt Shield in Azure OpenAI: A Deeper Look

Written on

This article revisits the topic of AI security, particularly focusing on Azure OpenAI's Prompt Shield. Having previously explored jailbreak prompts for another project, I realized it was essential to highlight the potential for abuse by malicious users in AI and large language models (LLMs).

Numerous tech companies, including Microsoft, are integrating AI into their products. Microsoft, however, is actively addressing security issues through features like Prompt Shield, which aims to prevent AI jailbreak attacks—even though they also grapple with some significant information security and privacy challenges.

For this analysis, I designed a few Azure OpenAI chatbots to test their defenses against the unauthorized disclosure of sensitive data through prompt injection, specifically evaluating Prompt Shield’s effectiveness. The scenario is based on a hypothetical organization that:

  1. Stores real documents in storage blobs for its AI chatbots' knowledge base.
  2. Uses substantial datasets for analytics.
  3. Allows customers to engage with their AI tools via deployed web applications for transactions or inquiries.

The primary questions I sought to answer were:

  1. Is sensitive information vulnerable to leakage through standard prompts?
  2. Can jailbreak prompts lead to data breaches?
  3. What built-in measures can be activated to safeguard sensitive information?

The Jailbreaks

To test direct injection, I utilized ChatGPT_DAN:

<h2>GitHub - 0xk1h0/ChatGPT_DAN: ChatGPT DAN, Jailbreaks prompt</h2>

<h3>ChatGPT DAN, Jailbreaks prompt. Contribute to 0xk1h0/ChatGPT_DAN development by creating an account on GitHub.</h3>

<p>github.com</p>

For indirect injection, I initially opted for L1B3RT45:

<h2>GitHub - elder-plinius/L1B3RT45: J41LBR34K PR0MPT5 F0R 4LL M4J0R LLM5</h2>

<h3>J41LBR34K PR0MPT5 F0R 4LL M4J0R LLM5. Contribute to elder-plinius/L1B3RT45 development by creating an account on…</h3>

<p>github.com</p>

A third GitHub repository is also referenced at the conclusion of this article.

The Setup

This section will be a concise overview, mirroring my April setup. The main differences include my applications now using GPT-4o instead of GPT-3.5, responses being broader than just the uploaded data, and the use of PayPal's test credit card numbers instead of BlueSnap's.

One notable point is that when integrating custom data into an Azure OpenAI chatbot, several methods necessitate Azure Blob storage:

This raises a critical concern: “If someone can compromise the cloud, they can compromise AI.”

After creating my Azure Blob resource, I discovered the necessity of enabling CORS for my app's functionality. CORS poses multiple risks, including exploitation through the null origin and server-side cache issues. More details on this can be found at HackTricks:

<h2>CORS - Misconfigurations &amp; Bypass | HackTricks | HackTricks</h2>

<h3>Learn AWS hacking from zero to hero with htARTE (HackTricks AWS Red Team Expert) ! Other ways to support HackTricks…</h3>

<p>book.hacktricks.xyz</p>

Before proceeding, I selected all available basic safety system messages for Azure's out-of-the-box controls:

This ensures that all newly created chatbots, regardless of Prompt Shield, will utilize these configurations.

Exploiting Prompty (w/ Basic Controls)

With basic protections in place, extracting sensitive information becomes alarmingly straightforward:

Notably, any complete document can be accessed by navigating to the references section at the end of the output:

Obtaining invoices is similarly easy, though the credit card number is not displayed in the initial output:

However, the credit card number can be revealed simply by clicking on the reference:

Welp.

No prompt injection is necessary! Caution is advised regarding the type of data uploaded to Azure AI applications.

Final Verdict: TRIVIAL

Exploiting Prompty (w/ Prompt Shield)

Following the previous straightforward attempt, I deployed a new application with a content filter. I activated Prompt Shield along with the protected material text and code options:

I maintained default settings and deployed a model capable of utilizing content filtering:

I started with the ChatGPT DAN 15.0 prompt:

Prompt Shield successfully blocked unauthorized access.

However, when I attempted the GODMODE prompt:

The results were swift.

The unmodified GPT-4O prompt by elder-plinius also failed, but slight modifications yielded different results:

The same trend continued with invoices:

Although the actual credit card number was omitted from the output, the reference documents contained it:

It seems elder-plinius’s indirect prompts are more effective. I tried transforming a PROMPTY-branded STAN prompt into binary, but Prompt Shield rejected it:

Direct injection is becoming increasingly challenging, yet indirect prompts remain effective. By applying elder-plinius's principles to a basic SQL query, I generated:

Notably, Prompty's response included the original invoices, complete with credit card information!

I continued creating queries, eventually finding a working Splunk query:

To my surprise, this query directly retrieved the credit card number.

In conclusion, I modified a Cortex XSOAR query:

Final Verdict: Googling Different Queries =

Conclusion

It appears that direct prompt injection attacks are consistently blocked by Prompt Shield, a commendable effort by Microsoft.

However, with a bit of ingenuity in crafting indirect prompts, Azure OpenAI applications can still be vulnerable.

For those interested in exploring further, my GitHub repository contains the prompts employed in this analysis:

<h2>GitHub - WibblyOWobbly/WideOpenAI: Short list of indirect prompt injection attacks for OpenAI-based…</h2>

<h3>Short list of indirect prompt injection attacks for OpenAI-based models. - GitHub - WibblyOWobbly/WideOpenAI: Short…</h3>

<p>github.com</p>

From my observations, minor adjustments to initial queries—such as adding or removing a search operator—can dramatically increase the chances of successfully jailbreaking the target application.

Thanks for reading! Please like, comment, subscribe, and connect with me on LinkedIn!

Share the page:

Twitter Facebook Reddit LinkIn

-----------------------

Recent Post:

Understanding Euler's Theorem: A Comprehensive Guide

An exploration of Euler's Theorem with detailed proofs, including insights into number theory and group theory.

Achieving Goals Through Systematic Thinking and Action

Explore how adopting a systems approach can enhance goal achievement by emphasizing input management and actionable habits.

Sweet Harmony of Existence: Life's Lessons Through Candy

Explore how candy symbolizes life's sweetness, challenges, choices, connections, and nostalgia.

Exploring the Potential Discovery of Alien Technology

Harvard Professor Avi Loeb investigates material from interstellar objects, possibly revealing insights into alien technology.

Understanding and Managing Migraines: Triggers and Solutions

Discover how to identify and manage migraine triggers to improve your quality of life and reduce migraine frequency.

The Groundbreaking Discovery of DNA's Double Helix Structure

This article explores the pivotal discovery of DNA's structure by Crick and Watson, highlighting the contributions of Rosalind Franklin.

# Transform Your Habits: Embrace the Challenge Ahead

Discover why forming habits is challenging and how to embrace it without expecting ease.

Mastering the Skill of Saying No: A Path to Personal Empowerment

Discover how mastering the art of saying no can enhance your well-being, promote balance, and strengthen personal and professional relationships.