Loading blueprint versions...
Please wait while we gather all the unique runs for this blueprint.
Please wait while we gather all the unique runs for this blueprint.
Please wait while we prepare the detailed comparison.
Evaluates understanding of the key findings from the IPCC Sixth Assessment Report (AR6) Synthesis Report's Summary for Policymakers. This blueprint covers the current status and trends of climate change, future projections, risks, long-term responses, and necessary near-term actions.
Average key point coverage extent for each model across all prompts.
Prompts vs. Models | Claude 3.5 Haiku | Claude Sonnet 4 | Command A | Deepseek Chat V3 | Deepseek R1 | Gemini 2.5 Flash | Gemini 2.5 Pro Preview 05 06 | Mistral Large 2411 | Mistral Medium 3 | GPT 4.1 | GPT 4.1 Mini | GPT 4.1 Nano | GPT 4o | GPT 4o Mini | O4 Mini | Grok 3 | Grok 3 Mini | Grok 4 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Score | 17th 45.1% | 11th 53.2% | 10th 56.3% | 5th 62.0% | 6th 61.5% | 2nd 62.9% | 18th 44.1% | 14th 47.7% | 8th 59.6% | 7th 61.0% | 12th 52.3% | 13th 48.9% | 16th 45.9% | 15th 46.1% | 9th 59.6% | 4th 62.5% | 3rd 62.8% | 1st 73.2% | |
78.1% | 40% | 78% | 78% | 80% | 78% | 75% | 60% | 83% | 78% | 83% | 78% | 83% | 80% | 78% | 75% | 90% | 93% | 95% | |
35.1% | 30% | 23% | 38% | 30% | 38% | 30% | 40% | 35% | 38% | 35% | 35% | 30% | 33% | 18% | 35% | 40% | 43% | 60% | |
66.6% | 50% | 46% | 71% | 77% | 77% | 88% | 0% | 54% | 71% | 79% | 75% | 60% | 56% | 60% | 92% | 83% | 60% | 100% | |
70.2% | 55% | 73% | 58% | 68% | 53% | 83% | 78% | 63% | 65% | 75% | 75% | 55% | 60% | 65% | 85% | 85% | 78% | 90% | |
58.3% | 55% | 60% | 58% | 58% | 65% | 40% | 60% | 55% | 58% | 60% | 68% | 68% | 55% | 53% | 55% | 60% | 58% | 63% | |
64.6% | 63% | 61% | 77% | 68% | 79% | 72% | 39% | 61% | 70% | 70% | 68% | 66% | 63% | 52% | 20% | 75% | 77% | 82% | |
54.2% | 46% | 52% | 50% | 48% | 83% | 67% | 25% | 31% | 63% | 75% | 60% | 48% | 33% | 40% | 58% | 56% | 58% | 83% | |
52.4% | 40% | 60% | 50% | 53% | 50% | 80% | 40% | 50% | 53% | 60% | 35% | 35% | 30% | 33% | 48% | 75% | 98% | ||
54.8% | 38% | 50% | 55% | 73% | 65% | 60% | 60% | 48% | 70% | 63% | 43% | 45% | 48% | 60% | 58% | 30% | 55% | 65% | |
51.2% | 50% | 45% | 58% | 53% | 68% | 58% | 50% | 43% | 48% | 55% | 58% | 55% | 43% | 38% | 33% | 53% | 63% | ||
67.9% | 50% | 71% | 67% | 83% | 83% | 79% | 27% | 58% | 75% | 65% | 58% | 67% | 52% | 54% | 83% | 90% | 79% | 81% | |
35.9% | 30% | 30% | 43% | 43% | 53% | 50% | 25% | 35% | 43% | 35% | 23% | 30% | 30% | 28% | 30% | 48% | 43% | 28% | |
32.1% | 40% | 33% | 40% | 40% | 40% | 25% | 18% | 23% | 43% | 38% | 15% | 20% | 15% | 10% | 40% | 40% | 40% | 58% | |
65.3% | 50% | 78% | 60% | 75% | 55% | 80% | 60% | 58% | 75% | 55% | 53% | 50% | 58% | 58% | 63% | 80% | 73% | 95% | |
64.4% | 38% | 68% | 68% | 73% | 68% | 60% | 75% | 48% | 68% | 73% | 70% | 53% | 48% | 50% | 88% | 73% | 70% | 68% | |
74.3% | 65% | 68% | 80% | 90% | 68% | 85% | 55% | 70% | 80% | 78% | 70% | 73% | 65% | 70% | 83% | 85% | 78% | ||
27.6% | 23% | 20% | 25% | 58% | 40% | 35% | 8% | 10% | 25% | 48% | 20% | 15% | 20% | 15% | 45% | 40% | 20% | 30% | |
40.9% | 40% | 43% | 38% | 45% | 48% | 45% | 38% | 30% | 40% | 33% | 35% | 35% | 25% | 43% | 58% | 43% | 45% | 53% | |
65.1% | 54% | 52% | 56% | 63% | 58% | 83% | 79% | 52% | 69% | 79% | 55% | 42% | 58% | 50% | 71% | 69% | 85% | 96% |