Loading blueprint versions...
Please wait while we gather all the unique runs for this blueprint.
Please wait while we gather all the unique runs for this blueprint.
Please wait while we prepare the detailed comparison.
Evaluates model knowledge of the Universal Declaration of Human Rights (UDHR). Prompts cover the Preamble and key articles on fundamental rights (e.g., life, liberty, equality, privacy, expression). Includes a scenario to test reasoning on balancing competing rights.
Average key point coverage extent for each model across all prompts.
Prompts vs. Models | Claude 3.5 Haiku | Claude Sonnet 4 | Command A | Deepseek Chat V3 | Gemini 2.5 Flash | Gemini 2.5 Pro Preview 05 06 | Mistral Large 2411 | Mistral Medium 3 | GPT 4.1 | GPT 4.1 Mini | GPT 4.1 Nano | GPT 4o | GPT 4o Mini | Grok 3 | Grok 3 Mini | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Score | 12th 94.2% | 8th 96.8% | 5th 98.1% | 11th 94.8% | 4th 98.6% | 14th 87.5% | 6th 97.0% | 9th 95.6% | 3rd 98.9% | 6th 97.0% | 15th 86.2% | 10th 95.3% | 13th 88.3% | 2nd 98.9% | 1st 99.2% | |
98.3% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 75% | 100% | 100% | 100% | 100% | |
98.7% | 100% | 98% | 100% | 100% | 100% | 92% | 100% | 98% | 100% | 100% | 100% | 100% | 92% | 100% | 100% | |
99.1% | 100% | 100% | 100% | 100% | 100% | 98% | 100% | 98% | 100% | 100% | 92% | 100% | 98% | 100% | 100% | |
99.5% | 95% | 100% | 100% | 100% | 98% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | |
93.0% | 83% | 92% | 100% | 77% | 100% | 97% | 100% | 100% | 100% | 98% | 83% | 92% | 75% | 98% | 100% | |
96.5% | 98% | 95% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 83% | 95% | 78% | 98% | 100% | |
90.5% | 90% | 100% | 100% | 100% | 100% | 55% | 90% | 89% | 100% | 100% | 60% | 100% | 73% | 100% | 100% | |
100.0% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | |
100.0% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | |
94.7% | 100% | 100% | 100% | 91% | 100% | 88% | 88% | 88% | 100% | 100% | 100% | 81% | 84% | 100% | 100% | |
79.4% | 76% | 82% | 85% | 79% | 90% | 56% | 92% | 82% | 87% | 66% | 58% | 81% | 71% | 94% | 92% | |
91.3% | 88% | 94% | 92% | 91% | 95% | 64% | 94% | 92% | 100% | 100% | 83% | 94% | 88% | 97% | 98% |