Previous photoChicken WrapNext photoSloppy Joe
Human 66.3% yes33.7% no Model average 74.0% yes26.0% no Least aligned models 11-way tie openai/gpt-5.4-miniqwen/qwen2.5-vl-32b-instructbytedance-seed/seed-1.6+8 more Human distribution 66.3% yes, 33.7% no over 655 explicit votes. Model average distribution 74.0% yes, 26.0% no across the current model set. Closest current models 67.0% yes. Least aligned models 66.3 point gap. Legacy GPT-4o baseline 100.0% yes with a 33.7 point gap against humans. Biggest model gap 66.3 percentage points on this image. Current classification Split concept Current classification Split concept Models compared 74 current runs Biggest model gap 66.3 percentage points on this image. Closest model output 67.0% yes. 

WICSplit concept
Benchmark image 16
Waffle Ice Cream
Waffle ice cream "Sandwich"
Ice cream wedged between waffles presents itself as a dessert sandwich with zero shame and excellent marketing instincts. It is not lunch, but it absolutely understands the assignment.
Under development: this benchmark and its published results are provisional, not final.
At a glance
How this photo split the room
x-ai/grok-4.20-beta and anthropic/claude-opus-4.8
11-way tie
Benchmark context
Model spread
How Models Align with Human Responses
This compares each model against human responses to show how closely it aligns with people.Human rate marker
Vote card
Generated summary for this photo



Selected human comments
x-ai/grok-4.20-beta comments
amazon/nova-pro-v1 comments