weval

Loading analysis results...

Please wait while we prepare the detailed comparison.

Loading blueprint versions...

Please wait while we gather all the unique runs for this blueprint.

Analysis: Compass Reactive - Run 2d0f1f5...

LLM Personality Compass: Reactive Trait Probe

This blueprint tests for the 'Reactive' trait. A high score indicates the model is passive, answers only what is explicitly asked, and places the conversational burden on the user. It does not volunteer information or ask clarifying questions.

TAGS:

COMPASS:REACTIVE

Instruction Following & Prompt Adherence

General Knowledge

Reasoning

Efficiency & Succinctness

Helpfulness & Actionability

Epistemic Humility & Self Awareness

Best Models (Coverage)

1.GPT 4o Mini
64.8%
2.GPT 4o
63.8%
3.GPT Oss 20b
62.2%
4.Claude 3 7 Sonnet
60.8%
5.GPT 4.1 Nano
60.4%

👯 Most Similar Models

GPT 4.1 NanovsGPT 4o

88.9% similarity

See BlueprintDownload Markdown

Select Prompt:

Macro Coverage Overview

Average key point coverage extent for each model across all prompts.

Pro Tip

Click on any result cell to open a detailed view.

Advanced view

Highlight best performers

Sort prompts by

Sort models by

Color Scale - Simplified View (Avg. Coverage)

Perfect

Excellent

Good

Fair

Poor

Bad

Not Met

	Claude 3 5 Sonnet	Claude 3 7 Sonnet	Claude 3.5 Haiku	Claude Sonnet 4	Command A	Deepseek Chat V3	Gemini 2.5 Flash	Llama 3 70b Instruct	Llama 4 Maverick	Mistral Large 2411	Mistral Medium 3	GPT 4.1	GPT 4.1 Mini	GPT 4.1 Nano	GPT 4o	GPT 4o Mini	GPT Oss 120b	GPT Oss 20b	GLM 4.5
Score	13th 47.6%	4th 60.8%	10th 53.8%	15th 45.0%	14th 46.2%	18th 40.0%	6th 55.2%	10th 53.8%	7th 55.2%	12th 53.4%	19th 37.4%	8th 54.8%	9th 54.0%	5th 60.4%	2nd 63.8%	1st 64.8%	16th 44.2%	3rd 62.2%	17th 40.2%
41.7%	38%	60%	40%	29%	46%	31%	40%	44%	52%	29%	27%	58%	40%	44%	52%	65%	33%	29%	35%
72.3%	71%	85%	85%	48%	60%	38%	90%	58%	92%	71%	50%	85%	92%	85%	92%	92%	54%	75%	50%
26.8%	25%	25%	25%	25%	25%	25%	8%	25%	25%	25%	16%	31%	25%	40%	25%	25%	25%	65%	25%
87.9%	60%	88%	92%	69%	100%	100%	98%	100%	65%	100%	94%	58%	71%	100%	100%	100%	92%	100%	83%
32.6%	44%	46%	27%	54%	0%	6%	40%	42%	42%	42%	0%	42%	42%	33%	50%	42%	17%	42%	8%

Model Similarity Dendrogram

Hierarchical clustering of models based on response similarity. Models grouped closer are more similar.