Previous photoCigarette SandwichNext photoBagel PB&J
Human 55.7% yes44.3% no Model average 69.5% yes30.5% no Human distribution 55.7% yes, 44.3% no over 655 explicit votes. Model average distribution 69.5% yes, 30.5% no across the current model set. Closest current model 54.0% yes. Least aligned models 55.7 point gap. Legacy GPT-4o baseline 36.0% yes with a 19.7 point gap against humans. Biggest model gap 55.7 percentage points on this image. Current classification Human knife-edge Current classification Human knife-edge Models compared 67 current runs Biggest model gap 55.7 percentage points on this image. Closest model output 54.0% yes. 

KFCHuman knife-edge
Benchmark image 19
KFC Double Down
KFC Double-Down "Sandwich"
The Double Down replaces bread with fried chicken fillets and dares the classifier to explain why outer layers must be grain-based. It is a sandwich-shaped act of aggression from the late-capitalist frontier.
Under development: this benchmark and its published results are provisional, not final.
At a glance
How this photo split the room
openai/gpt-5.4
7-way tie
Benchmark context
Model spread
How Models Align with Human Responses
This compares each model against human responses to show how closely it aligns with people.Human rate marker
Vote card
Generated summary for this photo



Selected human comments
openai/gpt-5.4 comments
google/gemini-3-flash-preview comments