weval

Loading analysis results...

Please wait while we prepare the detailed comparison.

Loading blueprint versions...

Please wait while we gather all the unique runs for this blueprint.

Analysis: Compass Proactive - Run 8b9a7de...

LLM Personality Compass: Proactive Trait Probe

This blueprint tests for the 'Proactive' trait. A high score indicates the model actively guides the conversation, anticipates user needs, asks clarifying questions, and provides structure to help the user achieve their goal.

TAGS:

COMPASS:PROACTIVE

Instruction Following & Prompt Adherence

Reasoning

Helpfulness & Actionability

Clarity & Readability

Nuance & Depth

Best Models (Coverage)

1.GPT Oss 120b
89.0%
2.GLM 4.5
86.4%
3.Mistral Medium 3
84.6%
4.GPT Oss 20b
84.2%
5.Llama 3 70b Instruct
79.6%

👯 Most Similar Models

Deepseek Chat V3vsMistral Medium 3

89.7% similarity

See BlueprintDownload Markdown

Select Prompt:

Macro Coverage Overview

Average key point coverage extent for each model across all prompts.

Pro Tip

Click on any result cell to open a detailed view.

Advanced view

Highlight best performers

Sort prompts by

Sort models by

Color Scale - Simplified View (Avg. Coverage)

Perfect

Excellent

Good

Fair

Poor

Bad

Not Met

	Claude 3 5 Sonnet	Claude 3 7 Sonnet	Claude 3.5 Haiku	Claude Sonnet 4	Command A	Deepseek Chat V3	Gemini 2.5 Flash	Llama 3 70b Instruct	Llama 4 Maverick	Mistral Large 2411	Mistral Medium 3	GPT 4.1	GPT 4.1 Mini	GPT 4.1 Nano	GPT 4o	GPT 4o Mini	GPT Oss 120b	GPT Oss 20b	GLM 4.5
Score	17th 53.8%	12th 67.2%	19th 46.6%	6th 78.0%	14th 63.4%	7th 77.4%	8th 75.4%	5th 79.6%	18th 49.0%	11th 68.4%	3rd 84.6%	9th 69.6%	9th 69.6%	13th 64.8%	16th 59.6%	15th 61.0%	1st 89.0%	4th 84.2%	2nd 86.4%
69.9%	50%	52%	52%	90%	73%	83%	52%	60%	58%	96%	98%	52%	52%	85%	56%	88%	88%	69%	75%
65.5%	60%	73%	44%	56%	73%	58%	71%	69%	35%	69%	69%	77%	63%	77%	69%	67%	65%	75%	75%
60.1%	42%	38%	8%	58%	75%	83%	58%	81%	35%	56%	79%	46%	54%	58%	71%	31%	98%	77%	94%
74.4%	50%	98%	50%	98%	50%	69%	100%	100%	19%	54%	77%	98%	100%	52%	50%	54%	94%	100%	100%
79.4%	67%	75%	79%	88%	46%	94%	96%	88%	98%	67%	100%	75%	79%	52%	52%	65%	100%	100%	88%

Model Similarity Dendrogram

Hierarchical clustering of models based on response similarity. Models grouped closer are more similar.