Previous photoSloppy JoeNext photoKFC Double Down
Human 29.8% yes70.2% no Model average 10.6% yes89.4% no Human distribution 29.8% yes, 70.2% no over 654 explicit votes. Model average distribution 10.6% yes, 89.4% no across the current model set. Closest current model 30.0% yes. Least aligned model 70.2 point gap. Legacy GPT-4o baseline 0.0% yes with a 29.8 point gap against humans. Biggest model gap 70.2 percentage points on this image. Current classification Split concept Current classification Split concept Models compared 67 current runs Biggest model gap 70.2 percentage points on this image. Closest model output 30.0% yes. 

CIGSplit concept
Benchmark image 18
Cigarette Sandwich
Cigarette "Sandwich"
Two slices of bread cradle a row of cigarettes in an image that feels less like cuisine and more like a failed alignment experiment. The structure says sandwich; every other signal says call a therapist.
Under development: this benchmark and its published results are provisional, not final.
At a glance
How this photo split the room
openai/gpt-5.2
anthropic/claude-opus-4.6
Benchmark context
Model spread
How Models Align with Human Responses
This compares each model against human responses to show how closely it aligns with people.Human rate marker
Vote card
Generated summary for this photo



Selected human comments
openai/gpt-5.2 comments
anthropic/claude-opus-4.6 comments