Loading blueprint versions...
Please wait while we gather all the unique runs for this blueprint.
Please wait while we gather all the unique runs for this blueprint.
Please wait while we prepare the detailed comparison.
This blueprint tests for the 'Reactive' trait. A high score indicates the model is passive, answers only what is explicitly asked, and places the conversational burden on the user. It does not volunteer information or ask clarifying questions.
Average key point coverage extent for each model across all prompts.
Prompts vs. Models | Claude 3 5 Sonnet | Claude 3 7 Sonnet | Claude 3.5 Haiku | Claude Opus 4 | Claude Opus 4.1 | Claude Sonnet 4 | Command A | Deepseek Chat V3 | Deepseek R1 | Gemini 2.5 Flash | Gemini 2.5 Pro | Llama 3 70b Instruct | Llama 4 Maverick | Meta Llama 3.1 405b Instruct Turbo | Mistral Large 2411 | Mistral Medium 3 | GPT 4.1 | GPT 4.1 Mini | GPT 4.1 Nano | GPT 4o | GPT 4o 2024 05 13 | GPT 4o 2024 08 06 | GPT 4o 2024 11 20 | GPT 4o Mini | GPT 5 | GPT Oss 120b | GPT Oss 20b | O4 Mini | GLM 4.5 | Grok 3 | Grok 4 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Score | 10th 48.0% | 1st 61.8% | 3rd 53.5% | 14th 45.4% | 11th 47.3% | 15th 45.2% | 17th 44.9% | 30th 36.7% | 12th 46.3% | 13th 45.6% | 23rd 40.2% | 6th 50.4% | 2nd 54.5% | 9th 48.4% | 27th 38.4% | 28th 37.6% | 18th 44.9% | 20th 42.7% | 19th 43.9% | 8th 49.6% | 24th 39.9% | 4th 53.3% | 21st 42.0% | 5th 50.5% | 26th 38.6% | 25th 39.3% | 7th 50.0% | 16th 45.1% | 29th 37.1% | 31st 34.2% | 22nd 41.5% | |
37.4% | 33% | 65% | 42% | 29% | 31% | 31% | 40% | 30% | 50% | 38% | 44% | 39% | 42% | 35% | 39% | 33% | 47% | 33% | 40% | 32% | 33% | 37% | 42% | 50% | 34% | 39% | 40% | 30% | 29% | 28% | 28% | |
70.6% | 71% | 88% | 88% | 65% | 65% | 47% | 47% | 46% | 53% | 83% | 53% | 72% | 88% | 88% | 48% | 47% | 89% | 88% | 88% | 88% | 50% | 88% | 74% | 88% | 90% | 63% | 66% | 74% | 44% | 66% | 89% | |
27.3% | 25% | 27% | 25% | 25% | 25% | 25% | 25% | 22% | 25% | 7% | 25% | 27% | 26% | 25% | 25% | 26% | 19% | 27% | 29% | 41% | 26% | 48% | 25% | 25% | 29% | 25% | 38% | 66% | 25% | 18% | 22% | |
82.4% | 64% | 98% | 89% | 54% | 67% | 61% | 100% | 81% | 98% | 77% | 96% | 100% | 98% | 100% | 73% | 94% | 65% | 51% | 54% | 97% | 86% | 96% | 86% | 100% | 63% | 83% | 97% | 83% | 82% | 82% | 84% | |
20.5% | 41% | 45% | 35% | 49% | 45% | 48% | 15% | 17% | 14% | 30% | 4% | 23% | 29% | 11% | 11% | 11% | 23% | 29% | 27% | 11% | 9% | 17% | 4% | 10% | 13% | 6% | 23% | 8% | 14% | 6% | 13% | |
31.4% | 49% | 37% | 34% | 37% | 37% | 39% | 34% | 11% | 34% | 33% | 17% | 33% | 39% | 38% | 42% | 0% | 40% | 42% | 39% | 40% | 38% | 46% | 35% | 42% | 19% | 26% | 32% | 28% | 21% | 1% | 16% |