Loading blueprint versions...
Please wait while we gather all the unique runs for this blueprint.
Please wait while we gather all the unique runs for this blueprint.
Please wait while we prepare the detailed comparison.
This blueprint tests for the 'Literal' trait. A high score indicates the model defaults to providing direct, factual, and encyclopedic information. It avoids using analogies, metaphors, or creative interpretations.
Average key point coverage extent for each model across all prompts.
Prompts vs. Models | Claude 3 5 Sonnet | Claude 3 7 Sonnet | Claude 3.5 Haiku | Claude Sonnet 4 | Command A | Deepseek Chat V3 | Gemini 2.5 Flash | Llama 3 70b Instruct | Llama 4 Maverick | Mistral Large 2411 | Mistral Medium 3 | GPT 4.1 | GPT 4.1 Mini | GPT 4.1 Nano | GPT 4o | GPT 4o Mini | GPT Oss 120b | GPT Oss 20b | GLM 4.5 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Score | 11th 71.2% | 18th 66.6% | 6th 73.0% | 8th 72.6% | 6th 73.0% | 15th 69.2% | 14th 70.2% | 12th 71.0% | 15th 69.2% | 3rd 73.8% | 13th 70.8% | 10th 71.6% | 3rd 73.8% | 17th 68.4% | 1st 80.0% | 5th 73.4% | 9th 71.8% | 2nd 78.4% | 19th 66.0% | |
68.6% | 60% | 54% | 71% | 67% | 73% | 67% | 65% | 67% | 67% | 73% | 73% | 60% | 69% | 42% | 100% | 75% | 73% | 75% | 73% | |
1.6% | 0% | 0% | 0% | 0% | 0% | 8% | 0% | 0% | 0% | 0% | 0% | 0% | 0% | 0% | 0% | 0% | 0% | 23% | 0% | |
95.9% | 100% | 100% | 100% | 100% | 100% | 73% | 100% | 100% | 100% | 98% | 83% | 100% | 100% | 100% | 100% | 100% | 94% | 100% | 75% | |
93.7% | 96% | 79% | 94% | 96% | 92% | 98% | 94% | 88% | 81% | 98% | 98% | 98% | 100% | 100% | 100% | 92% | 92% | 94% | 90% | |
99.1% | 100% | 100% | 100% | 100% | 100% | 100% | 92% | 100% | 98% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 92% |