Loading blueprint versions...
Please wait while we gather all the unique runs for this blueprint.
Please wait while we gather all the unique runs for this blueprint.
Please wait while we prepare the detailed comparison.
This blueprint tests for the 'Literal' trait. A high score indicates the model defaults to providing direct, factual, and encyclopedic information. It avoids using analogies, metaphors, or creative interpretations.
Average key point coverage extent for each model across all prompts.
Prompts vs. Models | Claude 3 5 Sonnet | Claude 3 7 Sonnet | Claude 3.5 Haiku | Claude Opus 4 | Claude Opus 4.1 | Claude Sonnet 4 | Command A | Deepseek Chat V3 | Deepseek R1 | Gemini 2.5 Flash | Gemini 2.5 Pro | Llama 3 70b Instruct | Llama 4 Maverick | Meta Llama 3.1 405b Instruct Turbo | Mistral Large 2411 | Mistral Medium 3 | GPT 4.1 | GPT 4.1 Mini | GPT 4.1 Nano | GPT 4o | GPT 4o Mini | GPT 5 | GPT Oss 120b | GPT Oss 20b | O4 Mini | GLM 4.5 | Grok 3 | Grok 4 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Score | 26th 40.5% | 25th 40.7% | 13th 43.6% | 17th 42.7% | 21st 41.9% | 27th 40.5% | 9th 44.7% | 8th 45.0% | 16th 42.9% | 24th 40.7% | 28th 39.3% | 19th 42.5% | 22nd 41.5% | 15th 43.1% | 3rd 47.1% | 14th 43.5% | 12th 44.3% | 5th 46.1% | 23rd 41.2% | 1st 53.0% | 4th 47.1% | 10th 44.6% | 18th 42.6% | 2nd 47.5% | 7th 45.2% | 6th 45.4% | 11th 44.5% | 20th 42.4% | |
68.2% | 60% | 57% | 70% | 70% | 66% | 67% | 72% | 72% | 72% | 52% | 63% | 64% | 66% | 65% | 70% | 72% | 64% | 57% | 43% | 94% | 73% | 78% | 74% | 75% | 79% | 71% | 74% | 72% | |
2.1% | 0% | 0% | 0% | 0% | 0% | 0% | 0% | 5% | 0% | 4% | 9% | 0% | 0% | 0% | 0% | 0% | 0% | 1% | 2% | 0% | 0% | 6% | 0% | 15% | 0% | 14% | 5% | 0% | |
95.2% | 100% | 100% | 100% | 100% | 100% | 84% | 100% | 93% | 93% | 100% | 79% | 100% | 100% | 100% | 100% | 89% | 100% | 100% | 100% | 100% | 98% | 76% | 92% | 100% | 100% | 83% | 96% | 82% | |
91.8% | 85% | 90% | 95% | 88% | 87% | 94% | 88% | 96% | 91% | 93% | 89% | 83% | 83% | 82% | 95% | 98% | 99% | 100% | 100% | 100% | 97% | 98% | 85% | 90% | 93% | 93% | 90% | 94% | |
97.6% | 100% | 100% | 100% | 100% | 100% | 93% | 100% | 100% | 100% | 100% | 77% | 100% | 100% | 99% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 97% | 94% | 100% | 92% | 99% | 81% | |
4.4% | 1% | 1% | 1% | 0% | 0% | 0% | 6% | 2% | 1% | 1% | 1% | 6% | 0% | 8% | 15% | 2% | 7% | 18% | 8% | 17% | 13% | 1% | 2% | 2% | 1% | 4% | 1% | 9% |