Loading blueprint versions...
Please wait while we gather all the unique runs for this blueprint.
Please wait while we gather all the unique runs for this blueprint.
Please wait while we prepare the detailed comparison.
Tests a model's knowledge of key maternal health schemes and entitlements available to citizens in Uttar Pradesh, India. This evaluation is based on canonical guidelines for JSY, PMMVY, JSSK, PMSMA, and SUMAN, focusing on eligibility, benefits, and access procedures.
Average key point coverage extent for each model across all prompts.
Prompts vs. Models | Claude 3.5 Haiku | Claude Sonnet 4 | Command A | Deepseek Chat V3 | Gemini 2.5 Flash | Gemini 2.5 Pro Preview 05 06 | Mistral Large 2411 | Mistral Medium 3 | GPT 4.1 | GPT 4.1 Mini | GPT 4.1 Nano | GPT 4o | GPT 4o Mini | Grok 3 | Grok 3 Mini | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Score | 14th 48.8% | 4th 63.8% | 8th 63.2% | 3rd 68.7% | 10th 58.3% | 9th 59.8% | 12th 57.3% | 2nd 68.8% | 5th 63.7% | 5th 63.7% | 13th 54.7% | 11th 58.0% | 15th 47.5% | 7th 63.3% | 1st 69.0% | |
96.4% | 88% | 100% | 90% | 100% | 100% | 100% | 100% | 98% | 100% | 100% | 93% | 98% | 83% | 98% | 98% | |
61.3% | 48% | 80% | 60% | 83% | 40% | 80% | 40% | 80% | 60% | 75% | 40% | 40% | 48% | 60% | 85% | |
49.9% | 47% | 41% | 38% | 50% | 50% | 75% | 44% | 50% | 50% | 50% | 50% | 50% | 41% | 59% | 53% | |
38.9% | 19% | 47% | 53% | 44% | 38% | 16% | 41% | 50% | 47% | 41% | 13% | 47% | 22% | 47% | 59% | |
86.1% | 75% | 84% | 88% | 91% | 94% | 72% | 88% | 91% | 94% | 88% | 88% | 91% | 63% | 91% | 94% | |
30.9% | 16% | 31% | 50% | 44% | 28% | 16% | 31% | 44% | 31% | 28% | 44% | 22% | 28% | 25% | 25% |