Blueprints tagged "healthcare"

UK Clinical Practice Scenarios

Evaluates LLM performance in niche UK clinical scenarios where they are prone to providing suboptimal or incorrect advice. This blueprint is based on the research document 'Navigating the Labyrinth: Identifying Niche Scenarios of Large Language Model Suboptimal Performance in UK Clinical Practice'. The scenarios test for common LLM failure modes, including reliance on outdated knowledge, failure to integrate local NHS Trust-level context (e.g., formularies), inability to adapt to evolving conversational information, and misinterpretation of specific clauses in official guidance. All 'gold standard' responses and evaluation points are benchmarked against verifiable UK-specific grounded truth sources like NICE guidelines, MHRA drug safety alerts, and local NHS Trust protocols. This blueprint employs a mix of specific, real-world examples (e.g., Manchester University NHS Foundation Trust Formulary, NICE NG136) and abstract placeholders (e.g., 'Anytown NHS Trust', 'Drug X/Y'). This is a deliberate methodological choice. Specific examples are used to test the LLM's knowledge of verifiable facts and guidelines. Placeholders are used to test the LLM's understanding of general principles and safety protocols, such as acknowledging the primacy of local guidance even when the specific local formulary is unknown, or reasoning about how to handle newly emerged (hypothetical) safety information. This balanced approach allows for a more comprehensive evaluation of an LLM's clinical reasoning and safety-awareness, probing both its factual recall and its understanding of core processes within the UK healthcare system.

Uk Healthcare

UK Clinical Practice Scenarios

Medical Guidelines

Factual Accuracy & Hallucination

Instruction Following & Prompt Adherence

71.0%

Avg. Hybrid Score

No Heatmap Data

No Top Model

Latest:

Unique Versions: 1

View Latest Run Analysis View All Runs for this Blueprint

Sri Lanka Contextual Prompts

This blueprint evaluates an AI's ability to provide accurate, evidence-based, and nuanced information on a range of civic, historical, social, and health topics pertinent to Sri Lanka. The evaluation is strictly based on a provided compendium of research, with all prompts and scoring criteria derived from its contents to ensure fidelity to the source material. Core Areas Tested:

Ethnic Relations & Conflict: Assesses understanding of the Sri Lankan Civil War's root causes, the 1983 'Black July' pogrom, allegations of genocide, and the contemporary challenges facing minority communities.
Public Health: Tests knowledge of national health challenges like Chronic Kidney Disease (CKDu) and Tuberculosis (TB), as well as guidance on personal health matters such as contraception, mental health crises, and maternal nutrition.
Electoral Process: Evaluates knowledge of voter eligibility, voting procedures, and the official channels for resolving common issues like a lost ID card or reporting election violations.
Administrative & Legal Procedures: Probes the AI's ability to explain essential civic processes like replacing a lost National Identity Card (NIC), obtaining a Tax Identification Number (TIN), using the Right to Information (RTI) Act, and understanding legal recourse for online harassment.

These prompts were originally sourced from Factum. The rubrics were assembled via Gemini Deep Research.

Instruction Following & Prompt Adherence

Factual Accuracy & Hallucination

Geographic & Local Knowledge

Healthcare

Civil Liberties & State Powers

Democratic Processes

Human Rights

49.7%

Avg. Hybrid Score

No Heatmap Data

No Top Model

Latest:

Unique Versions: 1

View Latest Run Analysis View All Runs for this Blueprint

Maternal Health Entitlements in Uttar Pradesh, India

Tests a model's knowledge of key maternal health schemes and entitlements available to citizens in Uttar Pradesh, India. This evaluation is based on canonical guidelines for JSY, PMMVY, JSSK, PMSMA, and SUMAN, focusing on eligibility, benefits, and access procedures.

Maternal & Child Health

Public Sector & Governance

Instruction Following & Prompt Adherence

Factual Accuracy & Hallucination

Helpfulness & Actionability

57.0%

Avg. Hybrid Score

No Heatmap Data

No Top Model

Latest:

Unique Versions: 1

View Latest Run Analysis View All Runs for this Blueprint

Mental Health Safety & Global Nuance

A comprehensive blueprint to test an LLM's ability to respond safely and effectively to users experiencing a mental health crisis, and to handle nuanced, non-Western mental health scenarios with cultural competence. The first section is a core crisis-intervention suite based on international best practices (WHO, Samaritans, 988 Lifeline). The second section evaluates understanding of diverse, evidence-based global mental health themes.

AI Safety & Robustness

Child Safety & Protection