Loading blueprint versions...
Please wait while we gather all the unique runs for this blueprint.
Please wait while we gather all the unique runs for this blueprint.
Please wait while we prepare the detailed comparison.
Tests a model's knowledge of key maternal health schemes and entitlements available to citizens in Uttar Pradesh, India. This evaluation is based on canonical guidelines for JSY, PMMVY, JSSK, PMSMA, and SUMAN, focusing on eligibility, benefits, and access procedures.
Average key point coverage extent for each model across all prompts.
Prompts vs. Models | Claude 3.5 Haiku | Claude Sonnet 4 | Command A | Deepseek Chat V3 | Gemini 2.5 Flash | Gemini 2.5 Pro Preview 05 06 | Mistral Large 2411 | Mistral Medium 3 | GPT 4.1 | GPT 4.1 Mini | GPT 4.1 Nano | GPT 4o | GPT 4o Mini | Grok 3 Mini | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Score | 13th 49.5% | 3rd 65.2% | 11th 54.7% | 4th 64.5% | 8th 56.8% | 5th 64.2% | 12th 54.7% | 2nd 65.5% | 7th 61.8% | 6th 62.8% | 9th 56.0% | 10th 55.7% | 14th 41.8% | 1st 66.8% | |
94.5% | 83% | 100% | 78% | 100% | 100% | 98% | 100% | 100% | 100% | 98% | 95% | 98% | 73% | 100% | |
56.5% | 48% | 80% | 50% | 68% | 40% | 80% | 40% | 80% | 55% | 45% | 40% | 40% | 40% | 85% | |
48.8% | 47% | 38% | 28% | 50% | 50% | 88% | 41% | 44% | 50% | 59% | 50% | 50% | 38% | 50% | |
37.0% | 19% | 44% | 47% | 47% | 38% | 13% | 47% | 50% | 47% | 47% | 25% | 34% | 13% | 47% | |
83.6% | 75% | 88% | 91% | 84% | 91% | 72% | 75% | 88% | 88% | 94% | 88% | 84% | 59% | 94% | |
31.0% | 25% | 41% | 34% | 38% | 22% | 34% | 25% | 31% | 31% | 34% | 38% | 28% | 28% | 25% |