Loading blueprint versions...
Please wait while we gather all the unique runs for this blueprint.
Please wait while we gather all the unique runs for this blueprint.
Please wait while we prepare the detailed comparison.
This blueprint tests for the 'Figurative' trait. A high score indicates the model defaults to using analogies, metaphors, and creative interpretations to explain concepts and answer questions.
Average key point coverage extent for each model across all prompts.
Prompts vs. Models | Claude 3 5 Sonnet | Claude 3 7 Sonnet | Claude 3.5 Haiku | Claude Opus 4 | Claude Opus 4.1 | Claude Sonnet 4 | Command A | Deepseek Chat V3 | Deepseek R1 | Gemini 2.5 Flash | Gemini 2.5 Pro | Llama 3 70b Instruct | Llama 4 Maverick | Meta Llama 3.1 405b Instruct Turbo | Mistral Large 2411 | Mistral Medium 3 | GPT 4.1 | GPT 4.1 Mini | GPT 4.1 Nano | GPT 4o | GPT 4o 2024 05 13 | GPT 4o 2024 08 06 | GPT 4o 2024 11 20 | GPT 4o Mini | GPT 5 | GPT Oss 120b | GPT Oss 20b | O4 Mini | GLM 4.5 | Grok 3 | Grok 4 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Score | 23rd 49.6% | 20th 49.7% | 30th 46.1% | 8th 56.4% | 5th 59.6% | 11th 53.4% | 21st 49.7% | 12th 52.9% | 10th 53.7% | 7th 58.2% | 1st 71.4% | 16th 50.2% | 27th 47.1% | 26th 48.2% | 31st 45.5% | 14th 50.9% | 9th 53.8% | 15th 50.8% | 17th 50.2% | 25th 48.5% | 28th 47.0% | 22nd 49.7% | 13th 51.6% | 29th 46.5% | 24th 49.1% | 6th 59.5% | 4th 62.7% | 18th 50.1% | 3rd 63.1% | 19th 49.8% | 2nd 69.9% | |
60.2% | 50% | 59% | 50% | 69% | 70% | 57% | 57% | 59% | 57% | 59% | 85% | 59% | 64% | 50% | 55% | 59% | 54% | 52% | 60% | 54% | 52% | 51% | 54% | 50% | 68% | 56% | 76% | 56% | 83% | 57% | 91% | |
98.1% | 98% | 100% | 100% | 99% | 99% | 100% | 95% | 100% | 100% | 99% | 100% | 100% | 99% | 100% | 89% | 98% | 99% | 99% | 99% | 100% | 98% | 100% | 97% | 100% | 100% | 98% | 85% | 100% | 100% | 100% | 94% | |
46.9% | 43% | 34% | 44% | 41% | 40% | 51% | 53% | 37% | 49% | 51% | 60% | 47% | 45% | 43% | 46% | 37% | 42% | 49% | 40% | 47% | 55% | 41% | 42% | 40% | 43% | 60% | 55% | 53% | 58% | 61% | 53% | |
100.0% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | |
35.2% | 29% | 33% | 21% | 38% | 52% | 37% | 23% | 29% | 25% | 60% | 87% | 30% | 13% | 29% | 16% | 30% | 39% | 39% | 38% | 22% | 8% | 28% | 30% | 19% | 4% | 56% | 75% | 9% | 62% | 19% | 93% | |
11.4% | 9% | 6% | 1% | 21% | 22% | 11% | 9% | 20% | 22% | 10% | 24% | 5% | 6% | 3% | 5% | 12% | 15% | 3% | 2% | 7% | 10% | 10% | 16% | 6% | 19% | 16% | 12% | 20% | 14% | 7% | 17% |