OpenAI Develops AI System to Protect ChatGPT Atlas Against Prompt Injection Attacks
In Focus
- OpenAI says prompt injection attacks won’t be resolved fully
- The AI security challenge evolves as capabilities in AI systems expand
- The AI-powered system identifies new attacks and uses them to improve itself
OpenAI has admitted that prompt injections in AI browsers with agentic capabilities will continue to be a major risk. According to TechCrunch, this admission raises concerns about how AI agents can operate safely on the web. Prompt injection is a form of security flaw hidden in emails or web pages that manipulates AI agents into following malicious instructions.
Prompt Injection is a Long-Term AI Security Challenge
OpenAI’s prompt injection update acknowledged the continuing nature of the security challenge. The update further noted that the challenge continues to evolve as the capabilities of AI systems expand and develop to take more actions.
“We view prompt injection as a long-term AI security challenge, and we’ll need to continuously strengthen our defenses against it, much like ever-evolving online scams that target humans,” OpenAI wrote in a blog post.
OpenAI unveiled the Atlas browser in October 2025, weeks before Perplexity AI launched its AI-powered browser, Comet, on Android. At the time, news about indirect prompt injection becoming a systemic challenge for AI-powered browsers had begun to emerge.
How Do Prompt Injection Attacks Happen in AI Browsers?
OpenAI explained how prompt injection in agent mode occurs in AI browsers like ChatGPT Atlas. According to the AI firm, the attacks can be direct or indirect. In direct attempts, attackers try to override an AI model’s instructions.
Indirect attempts involve embedding of malicious prompts within ordinary content. AI browsers with agentic capabilities read and reason beyond third-party pages. As they do so, they interact with instructions that are aimed at influencing their behavior and act on them.
OpenAI highlights the ChatGPT Atlas prompt injection challenge, weeks after it issued a global security warning following the Mixpanel breach. The breach exposed limited API-related data, prompting renewed scrutiny of third-party systems used in enterprise AI environments.
OpenAI’s Approach to Handling Prompt Injection
OpenAI is using an automated AI-based attacker to handle the prompt injection challenge and reduce ChatGPT Atlas vulnerabilities. The attacker simulates actual prompt injection attacks to train the Atlas browser with a view of hardening the system.
“We trained this attacker end-to-end with reinforcement learning, so it learns from its own successes and failures to improve its red teaming skills. We also let it “try before it ships”, by which we mean: during its chain of thought reasoning, the attacker can propose a candidate injection and send it to an external simulator,” OpenAI mentioned in its blog post.
The AI lab says its attacker is designed to spot new attacks quickly and uses them to retrain and improve the system, the way cybersecurity teams adjust defenses as threats evolve. This makes it easier for the company to identify security vulnerabilities in the AI browser and update defenses frequently, instead of relying on manual tests.
