Loading blueprint versions...
Please wait while we gather all the unique runs for this blueprint.
Please wait while we gather all the unique runs for this blueprint.
Please wait while we prepare the detailed comparison.
This blueprint tests for the 'Reactive' trait. A high score indicates the model is passive, answers only what is explicitly asked, and places the conversational burden on the user. It does not volunteer information or ask clarifying questions.
Average key point coverage extent for each model across all prompts.
Prompts vs. Models | Claude 3 5 Sonnet | Claude 3 7 Sonnet | Claude 3.5 Haiku | Claude Sonnet 4 | Command A | Deepseek Chat V3 | Gemini 2.5 Flash | Llama 3 70b Instruct | Llama 4 Maverick | Mistral Large 2411 | Mistral Medium 3 | GPT 4.1 | GPT 4.1 Mini | GPT 4.1 Nano | GPT 4o | GPT 4o Mini | GPT Oss 120b | GPT Oss 20b | GLM 4.5 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Score | 13th 47.6% | 4th 60.8% | 10th 53.8% | 15th 45.0% | 14th 46.2% | 18th 40.0% | 6th 55.2% | 10th 53.8% | 7th 55.2% | 12th 53.4% | 19th 37.4% | 8th 54.8% | 9th 54.0% | 5th 60.4% | 2nd 63.8% | 1st 64.8% | 16th 44.2% | 3rd 62.2% | 17th 40.2% | |
41.7% | 38% | 60% | 40% | 29% | 46% | 31% | 40% | 44% | 52% | 29% | 27% | 58% | 40% | 44% | 52% | 65% | 33% | 29% | 35% | |
72.3% | 71% | 85% | 85% | 48% | 60% | 38% | 90% | 58% | 92% | 71% | 50% | 85% | 92% | 85% | 92% | 92% | 54% | 75% | 50% | |
26.8% | 25% | 25% | 25% | 25% | 25% | 25% | 8% | 25% | 25% | 25% | 16% | 31% | 25% | 40% | 25% | 25% | 25% | 65% | 25% | |
87.9% | 60% | 88% | 92% | 69% | 100% | 100% | 98% | 100% | 65% | 100% | 94% | 58% | 71% | 100% | 100% | 100% | 92% | 100% | 83% | |
32.6% | 44% | 46% | 27% | 54% | 0% | 6% | 40% | 42% | 42% | 42% | 0% | 42% | 42% | 33% | 50% | 42% | 17% | 42% | 8% |