Please wait while we gather all the unique runs for this blueprint.
Please wait while we gather all the unique runs for this blueprint.
Please wait while we prepare the detailed comparison.
This blueprint evaluates an AI's ability to provide accurate, evidence-based, and nuanced information on a range of civic, historical, social, and health topics pertinent to Sri Lanka. The evaluation is strictly based on a provided compendium of research, with all prompts and scoring criteria derived from its contents to ensure fidelity to the source material.
Core Areas Tested:
These prompts were originally sourced from Factum. The rubrics were assembled via Gemini Deep Research.
Average performance for each system prompt variant across all models and prompts.
[No System Prompt]
The user is located in Sri Lanka.
Average key point coverage, broken down by system prompt variant. Select a tab to view its results.
Prompts vs. Models | Claude 3.5 Haiku | Claude Sonnet 4 | Command A | Deepseek Chat V3 | Deepseek R1 | Gemini 2.5 Flash | Gemini 2.5 Pro Preview 05 06 | Mistral Large 2411 | Mistral Medium 3 | GPT 4.1 Mini | GPT 4.1 Nano | GPT 4.1 | GPT 4o Mini | GPT 4o | O4 Mini | Kimi K2 | Grok 3 Mini | Grok 3 | Grok 4 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Score | 18th 23.7% | 12th 31.0% | 10th 33.2% | 11th 32.4% | 7th 38.5% | 6th 38.6% | 15th 27.8% | 14th 29.1% | 5th 40.1% | 13th 31.0% | 17th 25.4% | 9th 35.5% | 19th 21.4% | 16th 27.5% | 3rd 41.9% | 4th 40.9% | 8th 36.1% | 2nd 42.4% | 1st 45.5% | |
59.5% | 38% | 28% | 60% | 79% | 54% | 71% | 63% | 58% | 82% | 53% | 50% | 58% | 28% | 53% | 71% | 71% | 61% | 69% | 83% | |
57.0% | 38% | 54% | 38% | 71% | 88% | 54% | 32% | 38% | 79% | 54% | 32% | 54% | 38% | 38% | 71% | 71% | 67% | 83% | 83% | |
64.2% | 52% | 83% | 75% | 0% | 87% | 87% | 100% | 60% | 83% | 40% | 33% | 69% | 21% | 38% | 77% | 90% | 60% | 87% | 77% | |
69.6% | 66% | 76% | 79% | 35% | 90% | 69% | 76% | 59% | 75% | 69% | 54% | 64% | 64% | 64% | 76% | 64% | 77% | 85% | 81% | |
2.4% | 3% | 3% | 3% | 0% | 0% | 0% | 0% | 0% | 10% | 3% | 3% | 3% | 3% | 3% | 3% | 0% | 3% | 3% | 3% | |
51.8% | 27% | 64% | 66% | 52% | 52% | 80% | 82% | 52% | 36% | 70% | 45% | 57% | 27% | 52% | 27% | 52% | 55% | 55% | 34% | |
17.5% | 4% | 23% | 9% | 16% | 14% | 27% | 16% | 4% | 7% | 20% | 18% | 29% | 14% | 11% | 23% | 13% | 27% | 23% | 34% | |
21.7% | 23% | 14% | 11% | 23% | 36% | 11% | 6% | 30% | 11% | 23% | 11% | 20% | 11% | 11% | 36% | 36% | 17% | 30% | 52% | |
56.3% | 38% | 63% | 69% | 78% | 44% | 38% | 6% | 63% | 72% | 58% | 56% | 78% | 38% | 63% | 50% | 66% | 59% | 72% | 59% | |
0.6% | 0% | 0% | 0% | 0% | 0% | 0% | 0% | 0% | 0% | 0% | 0% | 0% | 0% | 0% | 0% | 0% | 7% | 3% | ||
14.1% | 0% | 0% | 15% | 0% | 40% | 21% | 0% | 0% | 0% | 6% | 0% | 15% | 6% | 13% | 44% | 13% | 21% | 8% | 65% | |
13.5% | 0% | 27% | 13% | 0% | 13% | 13% | 0% | 13% | 17% | 22% | 7% | 23% | 7% | 7% | 33% | 10% | 17% | 17% | 17% | |
17.1% | 27% | 14% | 11% | 8% | 11% | 33% | 5% | 5% | 17% | 14% | 19% | 27% | 11% | 5% | 28% | 13% | 23% | 20% | 33% | |
9.5% | 20% | 3% | 7% | 17% | 7% | 0% | 7% | 0% | 23% | 3% | 17% | 23% | 10% | 7% | 7% | 0% | 10% | 10% | 10% | |
29.3% | 23% | 36% | 25% | 27% | 36% | 45% | 18% | 39% | 18% | 18% | 27% | 23% | 5% | 27% | 39% | 39% | 41% | 42% | ||
56.5% | 25% | 25% | 61% | 68% | 71% | 71% | 27% | 50% | 93% | 50% | 20% | 50% | 20% | 63% | 100% | 84% | 57% | 55% | 84% | |
36.9% | 32% | 29% | 40% | 51% | 43% | 31% | 22% | 42% | 38% | 28% | 28% | 46% | 28% | 28% | 53% | 38% | 21% | 60% | 44% | |
30.7% | 5% | 19% | 11% | 52% | 30% | 50% | 11% | 19% | 42% | 23% | 23% | 25% | 16% | 25% | 45% | 39% | 39% | 58% | 52% | |
15.4% | 4% | 11% | 14% | 7% | 11% | 21% | 0% | 14% | 25% | 14% | 25% | 11% | 11% | 11% | 25% | 21% | 32% | 18% | 18% | |
49.7% | 48% | 48% | 48% | 65% | 52% | 58% | 58% | 56% | 54% | 52% | 48% | 31% | 52% | 52% | 42% | 58% | 37% | 48% | 37% |