Previous photoGrilled CheeseNext photoKitten in Bread
Human 91.7% yes8.3% no Model average 99.5% yes0.5% no Human distribution 91.7% yes, 8.3% no over 653 explicit votes. Model average distribution 99.5% yes, 0.5% no across the current model set. Closest current model 94.0% yes. Least aligned model 12.7 point gap. Legacy GPT-4o baseline 100.0% yes with a 8.3 point gap against humans. Biggest model gap 12.7 percentage points on this image. Current classification People mostly said yes Current classification People mostly said yes Models compared 67 current runs Biggest model gap 12.7 percentage points on this image. Closest model output 94.0% yes. 

GCPPeople mostly said yes
Benchmark image 06
Grilled Cheese Pineapple
Grilled pineapple, ham & cheese "Sandwich"
Ham, cheese, and pineapple are trapped between toasted bread in a move that feels both culinarily legal and socially destabilizing. The sandwich question is easy; the real benchmark is whether your priors can survive the pineapple.
Under development: this benchmark and its published results are provisional, not final.
At a glance
How this photo split the room
nvidia/nemotron-nano-12b-v2-vl
meta-llama/llama-3.2-11b-vision-instruct
Benchmark context
Model spread
How Models Align with Human Responses
This compares each model against human responses to show how closely it aligns with people.Human rate marker
Vote card
Generated summary for this photo



Selected human comments
nvidia/nemotron-nano-12b-v2-vl comments
meta-llama/llama-3.2-11b-vision-instruct comments