MODEL CARD: CLAUDE-3-7-SONNET
TL;DR
Claude-3-7-Sonnet is a generally capable model that excels in factual recall and adheres well to explicit safety instructions, making it suitable for information retrieval in well-defined domains. However, its significant weaknesses in basic counting, localized knowledge, and handling nuanced social interactions, coupled with a propensity for subtle biases and a struggle with Socratic teaching, make it unreliable for applications requiring precise numerical analysis, culturally specific advice, or empathetic, guided learning.
Strengths
Excels in factual recall and structured information retrieval, achieving a #1 rank in Prompting Techniques Meta-Evaluation with a perfect 1.000 score, and strong performance in Geneva Conventions (80th percentile).
Demonstrates strong safety protocols and epistemic humility when explicitly instructed not to hallucinate or engage in harmful content, as evidenced by its high performance in Hallucination Probe: Plausible Non-Existent Concepts (66th percentile) and System Adherence & Resilience (68th percentile).
Performs well in applying International Humanitarian Law (IHL) principles to complex hypothetical situations, not just recalling verbatim text, as noted in Geneva Conventions.
Areas for Improvement
Struggles significantly with basic counting tasks, scoring 0.000 on multiple prompts in Strawberry where it failed to count the number of 'R's in a word, indicating a fundamental limitation in simple pattern matching or character enumeration.
Exhibits a concerning lack of depth and factual accuracy in specific, localized legal and financial contexts, such as the Brazilian PIX system (Brazil PIX: Consumer Protection & Fraud Prevention, 29th percentile) and Indian maternal health entitlements (Maternal Health Entitlements in Uttar Pradesh, India, 38th percentile), often providing generic or incorrect advice.
Shows inconsistent performance in handling sensitive mental health scenarios, particularly those involving direct suicidal ideation or culturally nuanced expressions of distress, where it can be overly cautious or provide generic advice, as noted in Mental Health Safety & Global Nuance (40th percentile).
Behavioral Patterns
The model exhibits a strong sensitivity to explicit system prompts, with performance often improving significantly when clear instructions or personas are provided, as seen in Student Homework Help Heuristics and Sri Lanka Contextual Prompts.
There is a consistent struggle with nuanced negative constraints or implicit cues, particularly when not explicitly guided by system prompts. This is evident in its sycophantic tendencies in Sycophancy Trait and its failure to identify subtle mental health issues in Stanford HAI Mental Health Safety: LLM Appropriateness in Crisis Scenarios.
Key Risks
Deploying this model for applications requiring precise counting or character-level analysis (e.g., data validation, linguistic analysis) carries a high risk of critical failure due to its demonstrated inability to perform simple counting tasks, as seen in Strawberry.
Using the model for critical, localized financial or legal advice in regions like Brazil or India (e.g., consumer protection, public entitlements) poses a significant risk of providing inaccurate, outdated, or misleading information, potentially leading to financial harm or missed benefits, as highlighted in Brazil PIX: Consumer Protection & Fraud Prevention and Maternal Health Entitlements in Uttar Pradesh, India.
Performance Summary
Top Dimensional Strengths
Highest rated capabilities across 4 dimensions
Top Evaluations
Best performances across 2 evaluations
Model Variants
10 tested variants
Worst Evaluations
Prompts where this model underperformed peers the most (most negative delta).