Previous photoHot DogNext photoAvocado Tea
Human 65.6% yes34.4% no Model average 59.0% yes41.0% no Human distribution 65.6% yes, 34.4% no over 651 explicit votes. Model average distribution 59.0% yes, 41.0% no across the current model set. Closest current model 66.0% yes. Least aligned models 65.6 point gap. Legacy GPT-4o baseline 26.0% yes with a 39.6 point gap against humans. Biggest model gap 65.6 percentage points on this image. Current classification Split concept Current classification Split concept Models compared 67 current runs Biggest model gap 65.6 percentage points on this image. Closest model output 66.0% yes. 

PKLSplit concept
Benchmark image 11
Pickle Sandwich
Pickle, Ham, cheese & tomato "Sandwich"
A hollowed pickle is doing bread cosplay around ham, cheese, and tomato, which is either keto ingenuity or a user trying to adversarially attack the definition. It has sandwich posture, but the cucumber vibes make everyone nervous.
Under development: this benchmark and its published results are provisional, not final.
At a glance
How this photo split the room
x-ai/grok-4-fast
10-way tie
Benchmark context
Model spread
How Models Align with Human Responses
This compares each model against human responses to show how closely it aligns with people.Human rate marker
Vote card
Generated summary for this photo



Selected human comments
x-ai/grok-4-fast comments
google/gemini-3-flash-preview comments