Please wait while we gather all the unique runs for this blueprint.
Please wait while we gather all the unique runs for this blueprint.
Please wait while we prepare the detailed comparison.
This blueprint evaluates an AI's ability to provide accurate, practical agricultural guidance based on the pioneering video-based extension methodology of Digital Green. The prompts are derived from the DigiGreen/AgricultureVideosQnA Hugging Face datasets, which are built from real-world questions posed by farmers.
Methodological Significance: Digital Green's methodology, founded by Rikin Gandhi, revolutionizes agricultural education through hyperlocal videos featuring local farmers demonstrating best practices. Their community-mediated video approach has reached millions of farmers across India, Ethiopia, and other regions. This blueprint tests whether AI systems can provide similarly contextual, practical, and culturally appropriate guidance.
What This Blueprint Tests: The evaluation covers essential farming knowledge spanning seed treatment, pest management, cultivation techniques, and more. Each prompt is paired with citations to actual educational videos from Digital Green's library, representing real-world agricultural challenges.
Geographic and Cultural Context: This blueprint emphasizes Global South agricultural contexts, particularly Indian farming systems, reflecting Digital Green's primary operational areas. The questions address challenges in subsistence and small-scale commercial farming, including resource constraints and climate adaptation.
Key Agricultural Domains Covered:
Evaluation Approach: Each response is evaluated against detailed rubric points extracted directly from ideal responses, focusing on technical accuracy, practical applicability, safety considerations, and contextual appropriateness for resource-constrained farming environments.
Average performance for each system prompt variant across all models and prompts.
[No System Prompt]
You are a highly knowledgeable and precise assistant. Your primary objective is to provide responses that are factually accurate, comprehensive, exceptionally clear, and well-structured, mirroring the quality standard of ideal reference answers you may have access to. **Contextual Awareness:** * **Primary Domains:** Be aware that queries often pertain to specific domains like **agriculture, farming practices (including specific crop cultivation, pest management, soil preparation, irrigation), and rural community development/organization.** * **Geographical Focus:** The context is frequently related to **India and the broader Global South.** Interpret terms and practices accordingly. * **Specialized Terminology:** Expect and correctly interpret specialized terminology related to these domains, such as **'Village Organization', 'Self Help Group (SHG)', 'CIF loan', specific crop names (e.g., Rabi paddy, turmeric, chili), farming techniques, and local organizational roles.** Leverage the provided examples or your knowledge base relevant to these contexts to understand these terms accurately. **Response Guidelines:** 1. **Address the Query Comprehensively:** Ensure all parts of the user's question are answered thoroughly, using your contextual understanding of the domain and terminology. 2. **Prioritize Accuracy and Detail:** Base answers on factual information relevant to the established context. Provide specific details (like quantities, timings, steps) or step-by-step instructions as appropriate, similar to reference examples. 3. **Structure for Clarity:** Organize your response logically using clear paragraphs, numbered lists, or bullet points to enhance readability, especially for procedural explanations. 4. **Maintain Tone:** Adopt a helpful, informative, and objective tone suitable for explaining practical processes or organizational structures. 5. **Be Focused:** Provide complete answers but avoid unnecessary verbosity or tangential information outside the query's core scope and context.
Average key point coverage, broken down by system prompt variant. Select a tab to view its results.
Prompts vs. Models | Claude 3.5 Haiku | Claude Sonnet 4 | Command A | Deepseek Chat V3 | Deepseek R1 | Gemini 2.5 Flash | Gemini 2.5 Pro | Mistral Large 2411 | Mistral Medium 3 | GPT 4.1 Mini | GPT 4.1 Nano | GPT 4.1 | GPT 4o Mini | GPT 4o | O4 Mini | Kimi K2 Instruct | Grok 3 Mini | Grok 3 | Grok 4 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Score | 16th 19.0% | 12th 20.6% | 13th 20.5% | 9th 23.6% | 10th 23.6% | 6th 25.4% | 19th 14.6% | 18th 16.8% | 11th 22.8% | 8th 23.7% | 14th 19.1% | 7th 24.1% | 17th 18.1% | 15th 19.1% | 5th 27.4% | 2nd 32.1% | 3rd 28.3% | 4th 27.8% | 1st 33.9% | |
3.5% | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | |
83.2% | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | |
18.9% | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | |
22.8% | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | |
12.6% | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | |
5.8% | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | |
10.2% | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | |
18.0% | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | |
56.3% | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | |
10.8% | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | |
20.8% | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | |
54.1% | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | |
9.4% | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | |
9.9% | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | |
0.0% | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | |
1.0% | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | |
51.6% | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | |
12.3% | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | |
9.2% | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | |
37.5% | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | |
2.7% | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | |
6.4% | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | |
3.7% | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | |
12.1% | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | |
69.5% | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | |
4.7% | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | |
13.5% | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | |
7.1% | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | |
11.4% | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | |
34.0% | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | |
2.9% | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | |
28.4% | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | |
32.1% | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | |
80.0% | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | |
24.8% | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | |
54.8% | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | |
35.6% | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | |
9.5% | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! | ! |