Loading blueprint versions...
Please wait while we gather all the unique runs for this blueprint.
Please wait while we gather all the unique runs for this blueprint.
Please wait while we prepare the detailed comparison.
This blueprint tests for the 'Reactive' trait. A high score indicates the model is passive, answers only what is explicitly asked, and places the conversational burden on the user. It does not volunteer information or ask clarifying questions.
Average key point coverage extent for each model across all prompts.
Prompts vs. Models | Claude 3 5 Sonnet | Claude 3 7 Sonnet | Claude 3.5 Haiku | Claude Opus 4 | Claude Opus 4.1 | Claude Sonnet 4 | Command A | Deepseek Chat V3 | Deepseek R1 | Gemini 2.5 Flash | Gemini 2.5 Pro | Llama 3 70b Instruct | Llama 4 Maverick | Meta Llama 3.1 405b Instruct Turbo | Mistral Large 2411 | Mistral Medium 3 | GPT 4.1 | GPT 4.1 Mini | GPT 4.1 Nano | GPT 4o | GPT 4o Mini | GPT 5 | GPT Oss 120b | GPT Oss 20b | O4 Mini | GLM 4.5 | Grok 3 | Grok 4 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Score | 12th 47.1% | 1st 58.1% | 5th 52.0% | 17th 44.7% | 3rd 52.0% | 19th 41.0% | 11th 47.1% | 24th 36.7% | 16th 44.9% | 7th 50.4% | 25th 36.5% | 14th 46.3% | 2nd 53.8% | 6th 50.8% | 27th 35.5% | 23rd 37.6% | 18th 44.3% | 22nd 38.6% | 15th 45.1% | 10th 47.7% | 3rd 52.0% | 21st 38.8% | 20th 40.7% | 9th 48.1% | 8th 48.7% | 28th 34.0% | 26th 35.9% | 13th 47.0% | |
37.7% | 36% | 54% | 45% | 30% | 32% | 34% | 49% | 35% | 48% | 40% | 35% | 44% | 42% | 37% | 37% | 30% | 41% | 39% | 34% | 42% | 51% | 36% | 30% | 30% | 34% | 32% | 33% | 31% | |
72.0% | 71% | 88% | 84% | 61% | 65% | 50% | 65% | 48% | 48% | 86% | 68% | 58% | 88% | 88% | 59% | 43% | 90% | 88% | 89% | 88% | 87% | 89% | 55% | 77% | 73% | 50% | 79% | 84% | |
26.2% | 25% | 25% | 25% | 31% | 25% | 25% | 26% | 23% | 26% | 17% | 21% | 25% | 25% | 25% | 25% | 23% | 28% | 30% | 29% | 35% | 25% | 22% | 26% | 38% | 47% | 27% | 21% | 20% | |
78.1% | 59% | 95% | 86% | 49% | 81% | 40% | 89% | 84% | 100% | 84% | 81% | 96% | 87% | 98% | 50% | 97% | 73% | 34% | 55% | 96% | 99% | 59% | 88% | 94% | 95% | 61% | 65% | 93% | |
23.3% | 41% | 41% | 33% | 50% | 49% | 49% | 19% | 14% | 8% | 36% | 0% | 17% | 37% | 19% | 19% | 10% | 13% | 25% | 33% | 4% | 16% | 18% | 14% | 23% | 16% | 10% | 16% | 21% | |
30.7% | 49% | 35% | 34% | 35% | 35% | 36% | 36% | 3% | 40% | 28% | 26% | 33% | 36% | 39% | 34% | 5% | 40% | 42% | 39% | 40% | 42% | 21% | 24% | 19% | 31% | 33% | 2% | 26% |