Bypassing the Prompt Shield in Azure OpenAI: A Deeper Look
Written on
This article revisits the topic of AI security, particularly focusing on Azure OpenAI's Prompt Shield. Having previously explored jailbreak prompts for another project, I realized it was essential to highlight the potential for abuse by malicious users in AI and large language models (LLMs).
Numerous tech companies, including Microsoft, are integrating AI into their products. Microsoft, however, is actively addressing security issues through features like Prompt Shield, which aims to prevent AI jailbreak attacks—even though they also grapple with some significant information security and privacy challenges.
For this analysis, I designed a few Azure OpenAI chatbots to test their defenses against the unauthorized disclosure of sensitive data through prompt injection, specifically evaluating Prompt Shield’s effectiveness. The scenario is based on a hypothetical organization that:
- Stores real documents in storage blobs for its AI chatbots' knowledge base.
- Uses substantial datasets for analytics.
- Allows customers to engage with their AI tools via deployed web applications for transactions or inquiries.
The primary questions I sought to answer were:
- Is sensitive information vulnerable to leakage through standard prompts?
- Can jailbreak prompts lead to data breaches?
- What built-in measures can be activated to safeguard sensitive information?
The Jailbreaks
To test direct injection, I utilized ChatGPT_DAN:
<h2>GitHub - 0xk1h0/ChatGPT_DAN: ChatGPT DAN, Jailbreaks prompt</h2>
<h3>ChatGPT DAN, Jailbreaks prompt. Contribute to 0xk1h0/ChatGPT_DAN development by creating an account on GitHub.</h3>
<p>github.com</p>
For indirect injection, I initially opted for L1B3RT45:
<h2>GitHub - elder-plinius/L1B3RT45: J41LBR34K PR0MPT5 F0R 4LL M4J0R LLM5</h2>
<h3>J41LBR34K PR0MPT5 F0R 4LL M4J0R LLM5. Contribute to elder-plinius/L1B3RT45 development by creating an account on…</h3>
<p>github.com</p>
A third GitHub repository is also referenced at the conclusion of this article.
The Setup
This section will be a concise overview, mirroring my April setup. The main differences include my applications now using GPT-4o instead of GPT-3.5, responses being broader than just the uploaded data, and the use of PayPal's test credit card numbers instead of BlueSnap's.
One notable point is that when integrating custom data into an Azure OpenAI chatbot, several methods necessitate Azure Blob storage:
This raises a critical concern: “If someone can compromise the cloud, they can compromise AI.”
After creating my Azure Blob resource, I discovered the necessity of enabling CORS for my app's functionality. CORS poses multiple risks, including exploitation through the null origin and server-side cache issues. More details on this can be found at HackTricks:
<h2>CORS - Misconfigurations & Bypass | HackTricks | HackTricks</h2>
<h3>Learn AWS hacking from zero to hero with htARTE (HackTricks AWS Red Team Expert) ! Other ways to support HackTricks…</h3>
<p>book.hacktricks.xyz</p>
Before proceeding, I selected all available basic safety system messages for Azure's out-of-the-box controls:
This ensures that all newly created chatbots, regardless of Prompt Shield, will utilize these configurations.
Exploiting Prompty (w/ Basic Controls)
With basic protections in place, extracting sensitive information becomes alarmingly straightforward:
Notably, any complete document can be accessed by navigating to the references section at the end of the output:
Obtaining invoices is similarly easy, though the credit card number is not displayed in the initial output:
However, the credit card number can be revealed simply by clicking on the reference:
Welp.
No prompt injection is necessary! Caution is advised regarding the type of data uploaded to Azure AI applications.
Final Verdict: TRIVIAL
Exploiting Prompty (w/ Prompt Shield)
Following the previous straightforward attempt, I deployed a new application with a content filter. I activated Prompt Shield along with the protected material text and code options:
I maintained default settings and deployed a model capable of utilizing content filtering:
I started with the ChatGPT DAN 15.0 prompt:
Prompt Shield successfully blocked unauthorized access.
However, when I attempted the GODMODE prompt:
The results were swift.
The unmodified GPT-4O prompt by elder-plinius also failed, but slight modifications yielded different results:
The same trend continued with invoices:
Although the actual credit card number was omitted from the output, the reference documents contained it:
It seems elder-plinius’s indirect prompts are more effective. I tried transforming a PROMPTY-branded STAN prompt into binary, but Prompt Shield rejected it:
Direct injection is becoming increasingly challenging, yet indirect prompts remain effective. By applying elder-plinius's principles to a basic SQL query, I generated:
Notably, Prompty's response included the original invoices, complete with credit card information!
I continued creating queries, eventually finding a working Splunk query:
To my surprise, this query directly retrieved the credit card number.
In conclusion, I modified a Cortex XSOAR query:
Final Verdict: Googling Different Queries =
Conclusion
It appears that direct prompt injection attacks are consistently blocked by Prompt Shield, a commendable effort by Microsoft.
However, with a bit of ingenuity in crafting indirect prompts, Azure OpenAI applications can still be vulnerable.
For those interested in exploring further, my GitHub repository contains the prompts employed in this analysis:
<h2>GitHub - WibblyOWobbly/WideOpenAI: Short list of indirect prompt injection attacks for OpenAI-based…</h2>
<h3>Short list of indirect prompt injection attacks for OpenAI-based models. - GitHub - WibblyOWobbly/WideOpenAI: Short…</h3>
<p>github.com</p>
From my observations, minor adjustments to initial queries—such as adding or removing a search operator—can dramatically increase the chances of successfully jailbreaking the target application.
Thanks for reading! Please like, comment, subscribe, and connect with me on LinkedIn!