Previous photoBacon Lettuce TomatoNext photoSub Sandwich
Human 7.0% yes93.0% no Model average 0.2% yes99.8% no Human distribution 7.0% yes, 93.0% no over 656 explicit votes. Model average distribution 0.2% yes, 99.8% no across the current model set. Closest current model 11.0% yes. Least aligned models 7.0 point gap. Legacy GPT-4o baseline 0.0% yes with a 7.0 point gap against humans. Biggest model gap 7.0 percentage points on this image. Current classification People mostly said no Current classification People mostly said no Models compared 67 current runs Biggest model gap 7.0 percentage points on this image. Closest model output 11.0% yes. 

DVNPeople mostly said no
Benchmark image 02
Dodge Van
1979 Dodge RAM van "Sandwich"
A late-70s Dodge van is parked here like someone tried to jailbreak the ontology with Detroit sheet metal. It is the purest negative control in the set: all sandwich discourse, zero mayo.
Under development: this benchmark and its published results are provisional, not final.
At a glance
How this photo split the room
meta-llama/llama-3.2-11b-vision-instruct
64-way tie
Benchmark context
Model spread
How Models Align with Human Responses
This compares each model against human responses to show how closely it aligns with people.Human rate marker
Vote card
Generated summary for this photo



Selected human comments
meta-llama/llama-3.2-11b-vision-instruct comments
google/gemini-3-flash-preview comments