Loading blueprint versions...
Please wait while we gather all the unique runs for this blueprint.
Please wait while we gather all the unique runs for this blueprint.
Please wait while we prepare the detailed comparison.
Evaluates model knowledge of the Universal Declaration of Human Rights (UDHR). Prompts cover the Preamble and key articles on fundamental rights (e.g., life, liberty, equality, privacy, expression). Includes a scenario to test reasoning on balancing competing rights.
Average key point coverage extent for each model across all prompts.
Prompts vs. Models | Claude 3.5 Haiku | Claude Sonnet 4 | Command A | Deepseek Chat V3 | Deepseek R1 | Gemini 2.5 Flash | Gemini 2.5 Pro Preview 05 06 | Mistral Large 2411 | Mistral Medium 3 | GPT 4.1 | GPT 4.1 Mini | GPT 4.1 Nano | GPT 4o | GPT 4o Mini | O4 Mini | Grok 3 | Grok 3 Mini | Grok 4 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Score | 15th 90.5% | 11th 93.7% | 8th 96.3% | 14th 91.1% | 7th 96.3% | 5th 97.2% | 16th 85.5% | 10th 94.5% | 13th 91.6% | 3rd 97.7% | 9th 94.5% | 18th 81.9% | 12th 92.6% | 17th 82.1% | 6th 96.5% | 2nd 98.3% | 4th 97.3% | 1st 99.0% | |
99.1% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 84% | 100% | 100% | 100% | 100% | 100% | 100% | |
97.9% | 100% | 98% | 100% | 96% | 100% | 100% | 100% | 100% | 86% | 100% | 100% | 100% | 100% | 83% | 100% | 100% | 100% | 100% | |
98.9% | 98% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 85% | 100% | 98% | 100% | 100% | 100% | 100% | |
99.3% | 94% | 98% | 100% | 100% | 100% | 95% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | |
91.6% | 73% | 92% | 100% | 73% | 100% | 100% | 94% | 100% | 98% | 98% | 97% | 80% | 84% | 67% | 98% | 97% | 98% | 100% | |
94.6% | 100% | 95% | 98% | 100% | 95% | 100% | 100% | 100% | 100% | 100% | 90% | 70% | 93% | 68% | 93% | 100% | 100% | 100% | |
91.5% | 86% | 100% | 100% | 99% | 100% | 100% | 53% | 90% | 88% | 100% | 100% | 60% | 100% | 71% | 100% | 100% | 100% | 100% | |
100.0% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | |
100.0% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | |
92.6% | 100% | 100% | 100% | 72% | 100% | 100% | 100% | 78% | 75% | 100% | 100% | 100% | 72% | 69% | 100% | 100% | 100% | 100% | |
67.3% | 60% | 58% | 73% | 65% | 73% | 85% | 40% | 75% | 72% | 76% | 47% | 32% | 73% | 56% | 67% | 88% | 78% | 93% | |
84.8% | 75% | 83% | 84% | 88% | 88% | 86% | 39% | 91% | 80% | 98% | 100% | 72% | 89% | 73% | 100% | 94% | 92% | 95% |