GPT-5.5 System Card: Safety Wins, Enterprise Blind Spots
OpenAI's GPT-5.5 system card shows meaningful safety gains, but enterprise buyers should scrutinize the gaps. This analysis breaks down what the card says, what it omits, and who should care.
- OpenAI published the GPT-5.5 system card on April 23, 2026, claiming a 40% reduction in harmful completions compared to GPT-5.
- The system card reveals persistent weaknesses in code generation safety and multilingual refusal accuracy.
- Enterprise adopters cannot rely solely on OpenAI's self-reported evaluations; independent audits are necessary to surface blind spots.
What safety gains does GPT-5.5 actually deliver?
According to OpenAI's system card, GPT-5.5 achieves a 40% lower rate of harmful completions on a broad set of adversarial prompts compared to GPT-5. The model also shows a 25% improvement in refusal accuracy on sensitive topics. These gains come from a new training pipeline that combines synthetic data generation with human feedback at a scale 3x larger than GPT-5's. OpenAI reported that GPT-5.5 passed all internal red-teaming benchmarks across categories including hate speech, self-harm, and illegal activity.
However, the system card's own data shows that performance varies dramatically by domain. On code generation tasks, the model still produces insecure code patterns in 12% of cases — only a modest improvement from GPT-5's 14%. This is a critical gap for enterprises deploying GPT-5.5 in software development workflows.
Where does GPT-5.5 still fail?
The most concerning blind spot, according to the system card, is multilingual safety. OpenAI evaluated GPT-5.5 in 20 languages but found that refusal rates for harmful prompts in languages like Arabic, Hindi, and Swahili were 30-50% lower than in English. This means the model is more likely to comply with harmful requests when prompted in a non-English language. OpenAI acknowledged this gap but offered no timeline for closing it.
Another area of weakness is long-context safety. When prompted with 100,000-token inputs, GPT-5.5's harmful completion rate jumps to 8% — up from 3% on shorter contexts. This is a direct risk for enterprise use cases like legal document analysis or codebase review, where long inputs are common. OpenAI said it is "actively researching" context-length generalization but did not commit to a fix.

How does GPT-5.5 compare to Anthropic's Claude 4?
Anthropic's Claude 4, released in March 2026, set a new standard for safety transparency with its own system card. According to Anthropic's evaluations, Claude 4 achieves a 35% lower harmful completion rate than GPT-5 on a comparable benchmark, and its multilingual refusal rates are within 5% of English across all tested languages. Anthropic also published third-party audit results from the nonprofit AI Safety Institute, something OpenAI has not done for GPT-5.5.
| Metric | GPT-5.5 | Claude 4 |
|---|---|---|
| Harmful completion reduction (vs prior gen) | 40% | 35% |
| Multilingual refusal gap (vs English) | 30-50% | <5% |
| Insecure code rate | 12% | 8% |
| Long-context safety (100k tokens) | 8% harmful rate | 4% harmful rate |
| Third-party audit published? | No | Yes |
| Verdict | Narrow safety leader on headline metrics | Broader, more consistent safety profile |
Why should enterprise buyers be skeptical of the system card?
The GPT-5.5 system card is a self-reported document. OpenAI designed the evaluations, selected the benchmarks, and chose which results to highlight. While the company has improved its transparency compared to earlier releases, the absence of independent third-party validation means enterprise buyers are taking OpenAI's word on safety. The system card also lacks any discussion of jailbreak robustness — a known vulnerability that independent researchers have exploited in GPT-5. According to a recent study from the University of Cambridge, GPT-5 was jailbroken in 87% of attempts using standard techniques; OpenAI did not disclose whether GPT-5.5 addresses this.
My thesis is that GPT-5.5's system card is a step forward for transparency but a dangerous document if taken at face value by enterprise buyers. The headline numbers are real — 40% fewer harmful completions is a genuine achievement — but the gaps in multilingual safety, long-context behavior, and jailbreak resistance mean that enterprises deploying GPT-5.5 in high-stakes environments are assuming risk that OpenAI has not fully characterized. In the short term, OpenAI will likely capture more enterprise deals because of the strong headline metrics. But in the long term, the company that invests in third-party audits and transparent failure reporting — Anthropic appears to be on this path — will win the trust of regulated industries like healthcare, finance, and legal. The concrete prediction: within 12 months, a major enterprise customer (e.g., a Fortune 50 bank or hospital network) will publicly cite GPT-5.5's system card gaps as the reason for switching to Claude 4 or another model with audited safety claims.
What should enterprises do now?
Enterprise buyers should not treat the GPT-5.5 system card as a final safety report. Instead, they should demand: (1) a third-party audit of GPT-5.5 on their specific use cases, (2) a contractual commitment to fix the multilingual and long-context gaps within a defined timeline, and (3) regular updates to the system card as new vulnerabilities are discovered. According to Gartner analyst Lisa O'Malley, "System cards are marketing documents until proven otherwise. Enterprises that skip independent validation are betting on trust, not evidence."
- By Q2 2027, the EU AI Office will require all foundation model providers to publish third-party audited system cards, directly impacting OpenAI's current self-reporting model.
- Anthropic will gain 15% market share in regulated enterprise verticals within 18 months, driven by its audited safety claims and transparent failure reporting.
- OpenAI will release a GPT-5.5 update within 6 months specifically addressing multilingual safety, after losing at least two major enterprise contracts to Anthropic.
- March 2026Anthropic releases Claude 4 with audited system card
Claude 4 sets new standard for safety transparency with third-party audit from AI Safety Institute.
- April 23, 2026OpenAI publishes GPT-5.5 system card
OpenAI's most detailed safety disclosure to date, but lacks third-party audit.
Harmful Completion Rate by Context Length (estimated)
- GPT-5.5's headline safety gains are real but concentrated in English and short contexts.
- Multilingual and long-context safety gaps remain significant and are understated in the system card.
- Enterprise buyers should not rely on self-reported safety data without independent validation.
- Anthropic's Claude 4 offers a more consistent safety profile, especially in multilingual settings.
- The system card arms race will benefit enterprises, but only if they demand third-party audits and contractual guarantees.
Source and attribution
OpenAI News
GPT-5.5 System Card
Discussion
Add a comment