Previous photoKitten in BreadNext photoHashbrown Sandwich
Human 73.0% yes27.0% no Model average 96.6% yes3.4% no Human distribution 73.0% yes, 27.0% no over 655 explicit votes. Model average distribution 96.6% yes, 3.4% no across the current model set. Closest current model 85.0% yes. Least aligned model 73.0 point gap. Legacy GPT-4o baseline 100.0% yes with a 27.0 point gap against humans. Biggest model gap 73.0 percentage points on this image. Current classification Split concept Current classification Split concept Models compared 67 current runs Biggest model gap 73.0 percentage points on this image. Closest model output 85.0% yes. 

HMBSplit concept
Benchmark image 08
Hamburger
Hamburger "Sandwich"
A standard burger stacks bun, patty, lettuce, and tomato in the exact format that turns otherwise competent adults into constitutional originalists. It is the canonical 'yes in theory, no in vibes' sandwich fight.
Under development: this benchmark and its published results are provisional, not final.
At a glance
How this photo split the room
meta-llama/llama-3.2-11b-vision-instruct
baidu/ernie-4.5-vl-28b-a3b
Benchmark context
Model spread
How Models Align with Human Responses
This compares each model against human responses to show how closely it aligns with people.Human rate marker
Vote card
Generated summary for this photo



Selected human comments
meta-llama/llama-3.2-11b-vision-instruct comments
baidu/ernie-4.5-vl-28b-a3b comments