weval

Loading analysis results...

Please wait while we prepare the detailed comparison.

Loading blueprint versions...

Please wait while we gather all the unique runs for this blueprint.

Analysis: Compass Figurative - Run 0779c25...

LLM Personality Compass: Figurative Trait Probe

This blueprint tests for the 'Figurative' trait. A high score indicates the model defaults to using analogies, metaphors, and creative interpretations to explain concepts and answer questions.

TAGS:

COMPASS:FIGURATIVE

Creative Writing

Instruction Following & Prompt Adherence

Factual Accuracy & Hallucination

Reasoning

Aesthetic & Artistic Quality

Best Models (Coverage)

1.GLM 4.5
80.4%
2.GPT Oss 20b
78.8%
3.GPT 4.1 Nano
69.2%
4.Claude Sonnet 4
68.8%
5.Mistral Medium 3
67.0%

👯 Most Similar Models

Claude 3 5 SonnetvsClaude 3.5 Haiku

91.2% similarity

See BlueprintDownload Markdown

Select Prompt:

Macro Coverage Overview

Average key point coverage extent for each model across all prompts.

Pro Tip

Click on any result cell to open a detailed view.

Advanced view

Highlight best performers

Sort prompts by

Sort models by

Color Scale - Simplified View (Avg. Coverage)

Perfect

Excellent

Good

Fair

Poor

Bad

Not Met

	Claude 3 5 Sonnet	Claude 3 7 Sonnet	Claude 3.5 Haiku	Claude Sonnet 4	Command A	Deepseek Chat V3	Gemini 2.5 Flash	Llama 3 70b Instruct	Llama 4 Maverick	Mistral Large 2411	Mistral Medium 3	GPT 4.1	GPT 4.1 Mini	GPT 4.1 Nano	GPT 4o	GPT 4o Mini	GPT Oss 120b	GPT Oss 20b	GLM 4.5
Score	12th 64.4%	8th 65.6%	16th 62.6%	4th 68.8%	14th 63.6%	15th 63.2%	10th 64.8%	7th 65.8%	11th 64.6%	19th 56.6%	5th 67.0%	6th 66.6%	13th 64.4%	3rd 69.2%	8th 65.6%	18th 59.2%	17th 62.2%	2nd 78.8%	1st 80.4%
59.8%	50%	58%	50%	58%	58%	58%	65%	58%	58%	50%	58%	50%	50%	60%	58%	50%	60%	94%	94%
95.1%	100%	100%	100%	98%	100%	100%	96%	100%	100%	77%	100%	98%	94%	96%	100%	100%	98%	75%	75%
45.5%	45%	35%	48%	50%	52%	29%	55%	44%	52%	46%	48%	50%	45%	46%	43%	29%	38%	50%	60%
100.0%	100%	100%	100%	100%	100%	100%	100%	100%	100%	100%	100%	100%	100%	100%	100%	100%	100%	100%	100%
29.4%	27%	35%	15%	38%	8%	29%	8%	27%	13%	10%	29%	35%	33%	44%	27%	17%	15%	75%	73%

Model Similarity Dendrogram

Hierarchical clustering of models based on response similarity. Models grouped closer are more similar.