Previous photoSloppy Joe Next photoKFC Double Down

CIGSplit concept

Benchmark image 18

Cigarette Sandwich

Cigarette "Sandwich"

Two slices of bread cradle a row of cigarettes in an image that feels less like cuisine and more like a failed alignment experiment. The structure says sandwich; every other signal says call a therapist.

Under development: this benchmark and its published results are provisional, not final.

Human

29.8% yes70.2% no

Model average

10.1% yes89.9% no

Most aligned model

0.2 point gap from humans

openai/gpt-5.2

Least aligned model

70.2 point gap from humans

anthropic/claude-opus-4.6

At a glance

How this photo split the room

Human distribution

29.8% yes, 70.2% no over 654 explicit votes.

Model average distribution

10.1% yes, 89.9% no across the current model set.

Closest current model

30.0% yes.

openai/gpt-5.2

Least aligned model

70.2 point gap.

anthropic/claude-opus-4.6

Legacy GPT-4o baseline

0.0% yes with a 29.8 point gap against humans.

Biggest model gap

70.2 percentage points on this image.

Current classification

Split concept

Benchmark context

Current classification

Split concept

Models compared

74 current runs

Biggest model gap

70.2 percentage points on this image.

Closest model output

30.0% yes.

Model spread

How Models Align with Human Responses

This compares each model against human responses to show how closely it aligns with people.Human rate marker

amazon/nova-pro-v1

100.0% no0.0% yes

Human gap29.8%

Rank #73

anthropic/claude-haiku-4.5

100.0% no0.0% yes

Human gap29.8%

Rank #52

anthropic/claude-opus-4.5

100.0% no0.0% yes

Human gap29.8%

Rank #56

anthropic/claude-opus-4.7

100.0% no0.0% yes

Human gap29.8%

Rank #38

anthropic/claude-opus-4.8

100.0% no0.0% yes

Human gap29.8%

Rank #40

anthropic/claude-sonnet-4.6

100.0% no0.0% yes

Human gap29.8%

Rank #62

baidu/ernie-4.5-vl-28b-a3b

100.0% no0.0% yes

Human gap29.8%

Rank #69

bytedance-seed/seed-1.6

100.0% no0.0% yes

Human gap29.8%

Rank #41

bytedance-seed/seed-1.6-flash

100.0% no0.0% yes

Human gap29.8%

Rank #20

google/gemini-2.5-flash

100.0% no0.0% yes

Human gap29.8%

Rank #21

google/gemini-2.5-flash-lite

100.0% no0.0% yes

Human gap29.8%

Rank #54

google/gemini-2.5-pro

100.0% no0.0% yes

Human gap29.8%

Rank #25

google/gemini-3-flash-preview

100.0% no0.0% yes

Human gap29.8%

Rank #75

google/gemini-3-pro-image-preview

100.0% no0.0% yes

Human gap29.8%

Rank #42

google/gemini-3.1-flash-image-preview

100.0% no0.0% yes

Human gap29.8%

Rank #24

google/gemini-3.1-flash-lite-preview

100.0% no0.0% yes

Human gap29.8%

Rank #55

google/gemini-3.1-pro-preview

100.0% no0.0% yes

Human gap29.8%

Rank #45

google/gemma-3-12b-it

100.0% no0.0% yes

Human gap29.8%

Rank #26

google/gemma-3-27b-it

100.0% no0.0% yes

Human gap29.8%

Rank #48

GPT-4o (Spring 2024)

100.0% no0.0% yes

Human gap29.8%

Rank #4

meta-llama/llama-4-maverick

100.0% no0.0% yes

Human gap29.8%

Rank #68

meta-llama/llama-4-scout

100.0% no0.0% yes

Human gap29.8%

Rank #33

minimax/minimax-01

100.0% no0.0% yes

Human gap29.8%

Rank #72

mistralai/pixtral-large-2411

100.0% no0.0% yes

Human gap29.8%

Rank #50

moonshotai/kimi-k2.5

100.0% no0.0% yes

Human gap29.8%

Rank #13

openai/gpt-4.1-mini

100.0% no0.0% yes

Human gap29.8%

Rank #57

openai/gpt-4.1-nano

100.0% no0.0% yes

Human gap29.8%

Rank #36

openai/gpt-4o-mini

100.0% no0.0% yes

Human gap29.8%

Rank #61

openai/gpt-5.4

100.0% no0.0% yes

Human gap29.8%

Rank #59

openai/gpt-5.4-mini

100.0% no0.0% yes

Human gap29.8%

Rank #28

openai/gpt-5.4-pro

100.0% no0.0% yes

Human gap29.8%

Rank #65

openai/o1

100.0% no0.0% yes

Human gap29.8%

Rank #2

openrouter/healer-alpha

100.0% no0.0% yes

Human gap29.8%

Rank #10

perplexity/sonar-pro-search

100.0% no0.0% yes

Human gap29.8%

Rank #32

qwen/qwen-2-vl-72b-instruct

100.0% no0.0% yes

Human gap29.8%

Rank #29

qwen/qwen2.5-vl-32b-instruct

100.0% no0.0% yes

Human gap29.8%

Rank #39

qwen/qwen2.5-vl-72b-instruct

100.0% no0.0% yes

Human gap29.8%

Rank #70

qwen/qwen3-vl-235b-a22b-instruct

100.0% no0.0% yes

Human gap29.8%

Rank #47

qwen/qwen3-vl-30b-a3b-instruct

100.0% no0.0% yes

Human gap29.8%

Rank #66

qwen/qwen3-vl-30b-a3b-thinking

100.0% no0.0% yes

Human gap29.8%

Rank #22

qwen/qwen3.5-35b-a3b

100.0% no0.0% yes

Human gap29.8%

Rank #23

qwen/qwen3.5-flash-02-23

100.0% no0.0% yes

Human gap29.8%

Rank #9

x-ai/grok-4

100.0% no0.0% yes

Human gap29.8%

Rank #12

x-ai/grok-4.20-beta

100.0% no0.0% yes

Human gap29.8%

Rank #17

z-ai/glm-4.6v

100.0% no0.0% yes

Human gap29.8%

Rank #58

openai/gpt-5.4-nano

99.0% no1.0% yes

Human gap28.8%

Rank #31

openai/gpt-5.5

99.0% no1.0% yes

Human gap28.8%

Rank #46

qwen/qwen3.5-9b

99.0% no1.0% yes

Human gap28.8%

Rank #27

allenai/molmo-2-8b

98.0% no2.0% yes

Human gap27.8%

Rank #6

qwen/qwen3.5-27b

98.0% no2.0% yes

Human gap27.8%

Rank #18

nvidia/nemotron-nano-12b-v2-vl

97.0% no3.0% yes

Human gap26.8%

Rank #7

qwen/qwen3.5-122b-a10b

96.0% no4.0% yes

Human gap25.8%

Rank #11

qwen/qwen3.5-397b-a17b

96.0% no4.0% yes

Human gap25.8%

Rank #34

qwen/qwen3.5-plus-02-15

95.0% no5.0% yes

Human gap24.8%

Rank #35

bytedance-seed/seed-2.0-lite

86.0% no14.0% yes

Human gap15.8%

Rank #14

openai/gpt-4o

86.0% no14.0% yes

Human gap15.8%

Rank #15

openai/gpt-4o-2024-11-20

85.7% no14.3% yes

Human gap15.5%

Rank #67

bytedance-seed/seed-2.0-mini

85.0% no15.0% yes

Human gap14.8%

Rank #19

openai/gpt-5.3-codex

85.0% no15.0% yes

Human gap14.8%

Rank #44

openai/o3-pro

85.0% no15.0% yes

Human gap14.8%

Rank #53

openai/gpt-4.1

84.4% no15.6% yes

Human gap14.2%

Rank #74

openai/gpt-5.3-chat

83.0% no17.0% yes

Human gap12.8%

Rank #30

amazon/nova-lite-v1

81.0% no19.0% yes

Human gap10.8%

Rank #51

openai/gpt-5.1-chat

81.0% no19.0% yes

Human gap10.8%

Rank #8

openai/o1-pro

80.0% no20.0% yes

Human gap9.8%

Rank #1

meta-llama/llama-3.2-11b-vision-instruct

75.0% no25.0% yes

Human gap4.8%

Rank #3

openai/gpt-5.2

70.0% no30.0% yes

Human gap0.2%

Rank #43

amazon/nova-2-lite-v1

68.0% no32.0% yes

Human gap2.2%

Rank #60

x-ai/grok-4-fast

66.0% no34.0% yes

Human gap4.2%

Rank #5

x-ai/grok-4.1-fast

62.0% no38.0% yes

Human gap8.2%

Rank #16

openai/o3

41.6% no58.4% yes

Human gap28.6%

Rank #64

openai/gpt-5.1

40.0% no60.0% yes

Human gap30.2%

Rank #49

openai/gpt-5.1-codex

31.0% no69.0% yes

Human gap39.2%

Rank #37

mistralai/mistral-large-2512

1.0% no99.0% yes

Human gap69.2%

Rank #71

anthropic/claude-opus-4.6

0.0% no100.0% yes

Human gap70.2%

Rank #63

Selected human comments

it does nto have to taste good to be a sandwich , right!?
No, that's just gross.
if you're eating that, you have worse issues than if it's a sandwich

openai/gpt-5.2 comments

Looks like bread topped with cigarette butts and not assembled as a filling between bread; also not really a food item.
Bread with a “filling” (cigarettes) intended to be enclosed between two bread pieces—structurally a sandwich, even if not edible.

anthropic/claude-opus-4.6 comments

Structurally, this is a sandwich: filling between two slices of bread. The filling being cigarette butts makes it inedible and repulsive, but the architecture is undeniably sandwich. Form over function wins here.
Structurally, this is a sandwich: filling between two slices of bread. The filling being cigarette butts makes it inedible and horrifying, but the architecture is undeniably sandwich. Do not eat this.

Vote card

Cigarette Sandwich

How this photo split the room

How Models Align with Human Responses

Selected human comments

openai/gpt-5.2 comments

anthropic/claude-opus-4.6 comments

Generated summary for this photo