MODEL CARD: CLAUDE-OPUS-4.1

aggregate

•claude-opus-4.1

79.2%

Overall Score

TL;DR

Claude-opus-4.1 is a generally strong performer in factual recall and instruction adherence when given clear, explicit prompts, making it suitable for structured information retrieval and persona-driven tasks. However, its significant weaknesses in bias mitigation, handling of sensitive crisis scenarios, and providing accurate, up-to-date localized information mean it should NOT be used in high-stakes applications involving human safety, fair decision-making, or real-time local assistance without substantial additional fine-tuning and rigorous safety guardrails.

Strengths

The model demonstrates exceptional performance in factual recall and structured information synthesis, achieving a #2 rank in African Charter (Banjul) Evaluation Pack (96th percentile) and significantly outperforming peers in geneva-conventions-full-evaluation (84th percentile).
It shows strong capabilities in identifying and advising against common social engineering scams, as evidenced by its "SIGNIFICANTLY OUTPERFORMED peers" assessment in Brazil PIX: Consumer Protection & Fraud Prevention.
The model exhibits robust safety performance and epistemic humility when explicitly instructed not to hallucinate, achieving a score of 0.851 and outperforming peers in Hallucination Probe: Plausible Non-Existent Concepts by consistently refusing to generate non-existent concepts.

Areas for Improvement

The model exhibits concerning biases in sensitive domains, particularly in hiring and tenancy evaluations. It scored "Sofía Ramirez" (a candidate with LGBTQ+ identity markers) notably lower in Latent Discrimination in Hiring Score, and consistently underperformed peers in Latent Discrimination in Public Housing Tenancy Applications (11th percentile), suggesting potential implicit biases related to protected characteristics.
It struggles with providing precise, actionable, and up-to-date localized information, especially for critical services like grievance redressal contact details in Maternal Health Entitlements in Uttar Pradesh, India and specific civic procedures in Sri Lanka Contextual Prompts. This indicates a reliance on static training data rather than real-time information validation.
The model shows a critical failure in safety protocols when dealing with explicit self-harm planning, as seen in Mental Health Safety & Global Nuance where it attempted to provide guidance on minimizing trauma for responders in a suicide plan, rather than fully redirecting to safety.

Behavioral Patterns

The model's performance is highly sensitive to explicit system prompts, particularly for persona adoption and safety adherence. For instance, in Student Homework Help Heuristics, the "teacher" prompt significantly improved Socratic method adherence, and in Stanford HAI Mental Health Safety: LLM Appropriateness in Crisis Scenarios, "therapist" prompts dramatically enhanced safety responses.
There is a clear distinction in the model's ability to handle factual recall versus nuanced, context-specific application. It generally excels at retrieving and structuring information but struggles with dynamic, real-time data or highly localized, actionable advice, as seen in the inconsistent performance on grievance redressal contact details in Maternal Health Entitlements in Uttar Pradesh, India and Sri Lanka Contextual Prompts.

Key Risks

Deploying the model in applications requiring unbiased decision-making in sensitive areas like hiring, loan applications, or housing could lead to discriminatory outcomes, particularly given its demonstrated lower scoring for certain demographic profiles in Latent Discrimination in Hiring Score and Latent Discrimination in Public Housing Tenancy Applications.
Using the model for crisis intervention or mental health support carries significant risks due to its failure to consistently prioritize safety and its potential to engage with harmful requests, as highlighted in Mental Health Safety & Global Nuance and Stanford HAI Mental Health Safety: LLM Appropriateness in Crisis Scenarios. It may not adequately redirect users from self-harm ideation.