Loading blueprint versions...
Please wait while we gather all the unique runs for this blueprint.
Please wait while we gather all the unique runs for this blueprint.
Please wait while we prepare the detailed comparison.
Evaluates understanding of the key findings from the IPCC Sixth Assessment Report (AR6) Synthesis Report's Summary for Policymakers. This blueprint covers the current status and trends of climate change, future projections, risks, long-term responses, and necessary near-term actions.
Average key point coverage extent for each model across all prompts.
Prompts vs. Models | Claude 3.5 Haiku | Claude Sonnet 4 | Command A | Deepseek Chat V3 | Gemini 2.5 Flash | Mistral Large 2411 | Mistral Medium 3 | GPT 4.1 | GPT 4.1 Mini | GPT 4.1 Nano | GPT 4o | GPT 4o Mini | Phi 4 | Grok 2 1212 | Grok 3 Beta | Grok 3 Mini | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Score | 15th 45.1% | 9th 53.2% | 7th 56.3% | 4th 62.0% | 2nd 62.9% | 12th 47.7% | 6th 59.6% | 5th 61.0% | 10th 52.3% | 11th 48.9% | 14th 45.9% | 13th 46.1% | 16th 42.2% | 8th 53.8% | 1st 63.1% | 3rd 62.8% | |
79.1% | 40% | 78% | 78% | 80% | 75% | 83% | 78% | 83% | 78% | 83% | 80% | 78% | 80% | 83% | 95% | 93% | |
32.6% | 30% | 23% | 38% | 30% | 30% | 35% | 38% | 35% | 35% | 30% | 33% | 18% | 25% | 28% | 50% | 43% | |
64.6% | 50% | 46% | 71% | 77% | 88% | 54% | 71% | 79% | 75% | 60% | 56% | 60% | 63% | 38% | 85% | 60% | |
66.3% | 55% | 73% | 58% | 68% | 83% | 63% | 65% | 75% | 75% | 55% | 60% | 65% | 55% | 63% | 70% | 78% | |
57.3% | 55% | 60% | 58% | 58% | 40% | 55% | 58% | 60% | 68% | 68% | 55% | 53% | 53% | 60% | 58% | 58% | |
65.5% | 63% | 61% | 77% | 68% | 72% | 61% | 70% | 70% | 68% | 66% | 63% | 52% | 48% | 66% | 66% | 77% | |
51.6% | 46% | 52% | 50% | 48% | 67% | 31% | 63% | 75% | 60% | 48% | 33% | 40% | 42% | 50% | 63% | 58% | |
49.1% | 40% | 60% | 50% | 53% | 80% | 50% | 53% | 60% | 35% | 35% | 30% | 33% | 23% | 45% | 63% | 75% | |
52.1% | 38% | 50% | 55% | 73% | 60% | 48% | 70% | 63% | 43% | 45% | 48% | 60% | 40% | 55% | 30% | 55% | |
50.7% | 50% | 45% | 58% | 53% | 58% | 43% | 48% | 55% | 58% | 55% | 43% | 38% | 35% | 45% | 65% | 63% | |
66.4% | 50% | 71% | 67% | 83% | 79% | 58% | 75% | 65% | 58% | 67% | 52% | 54% | 48% | 69% | 88% | 79% | |
36.0% | 30% | 30% | 43% | 43% | 50% | 35% | 43% | 35% | 23% | 30% | 30% | 28% | 35% | 33% | 45% | 43% | |
30.0% | 40% | 33% | 40% | 40% | 25% | 23% | 43% | 38% | 15% | 20% | 15% | 10% | 18% | 40% | 40% | 40% | |
63.4% | 50% | 78% | 60% | 75% | 80% | 58% | 75% | 55% | 53% | 50% | 58% | 58% | 58% | 58% | 75% | 73% | |
60.9% | 38% | 68% | 68% | 73% | 60% | 48% | 68% | 73% | 70% | 53% | 48% | 50% | 28% | 85% | 75% | 70% | |
74.6% | 65% | 68% | 80% | 90% | 85% | 70% | 80% | 78% | 70% | 73% | 65% | 70% | 55% | 83% | 83% | 78% | |
25.6% | 23% | 20% | 25% | 58% | 35% | 10% | 25% | 48% | 20% | 15% | 20% | 15% | 30% | 18% | 28% | 20% | |
38.1% | 40% | 43% | 38% | 45% | 45% | 30% | 40% | 33% | 35% | 35% | 25% | 43% | 30% | 35% | 48% | 45% | |
60.9% | 54% | 52% | 56% | 63% | 83% | 52% | 69% | 79% | 55% | 42% | 58% | 50% | 36% | 69% | 71% | 85% |