Previous photoHamburger Next photoHot Dog

HSHHuman knife-edge

Benchmark image 09

Hashbrown Sandwich

Hash brown, egg, bacon, egg & cheese "Sandwich"

A breakfast stack uses hash-brown slabs as the outer chassis for bacon, egg, and cheese, like fast-food R&D got too comfortable with category theory. It is handheld, layered, and deeply committed to making 'bread' feel optional.

Under development: this benchmark and its published results are provisional, not final.

Human

59.4% yes40.6% no

Model average

90.2% yes9.8% no

Most aligned model

1.4 point gap from humans

openai/gpt-5.5

Least aligned models

42-way tie

openai/o1-pro

openai/o1

openai/gpt-5.1-chat+39 more

At a glance

How this photo split the room

Human distribution

59.4% yes, 40.6% no over 655 explicit votes.

Model average distribution

90.2% yes, 9.8% no across the current model set.

Closest current model

58.0% yes.

openai/gpt-5.5

Least aligned models

40.6 point gap.

42-way tie

Legacy GPT-4o baseline

84.0% yes with a 24.6 point gap against humans.

Biggest model gap

40.6 percentage points on this image.

Current classification

Human knife-edge

Benchmark context

Current classification

Human knife-edge

Models compared

74 current runs

Biggest model gap

40.6 percentage points on this image.

Closest model output

58.0% yes.

Model spread

How Models Align with Human Responses

This compares each model against human responses to show how closely it aligns with people.Human rate marker

qwen/qwen3-vl-30b-a3b-thinking

78.0% no22.0% yes

Human gap37.4%

Rank #22

openai/gpt-5.4-pro

66.0% no34.0% yes

Human gap25.4%

Rank #65

google/gemini-3-flash-preview

65.8% no34.2% yes

Human gap25.2%

Rank #75

google/gemini-3.1-flash-image-preview

65.0% no35.0% yes

Human gap24.4%

Rank #24

openai/gpt-5.4-mini

64.0% no36.0% yes

Human gap23.4%

Rank #28

x-ai/grok-4

63.0% no37.0% yes

Human gap22.4%

Rank #12

x-ai/grok-4.1-fast

58.0% no42.0% yes

Human gap17.4%

Rank #16

openai/gpt-5.5

42.0% no58.0% yes

Human gap1.4%

Rank #46

openai/gpt-4.1-mini

29.7% no70.3% yes

Human gap10.9%

Rank #57

x-ai/grok-4-fast

25.0% no75.0% yes

Human gap15.6%

Rank #5

meta-llama/llama-3.2-11b-vision-instruct

21.0% no79.0% yes

Human gap19.6%

Rank #3

openai/gpt-5.3-codex

19.0% no81.0% yes

Human gap21.6%

Rank #44

GPT-4o (Spring 2024)

16.0% no84.0% yes

Human gap24.6%

Rank #4

openai/gpt-5.3-chat

16.0% no84.0% yes

Human gap24.6%

Rank #30

bytedance-seed/seed-1.6-flash

13.0% no87.0% yes

Human gap27.6%

Rank #20

google/gemini-3-pro-image-preview

11.0% no89.0% yes

Human gap29.6%

Rank #42

nvidia/nemotron-nano-12b-v2-vl

11.0% no89.0% yes

Human gap29.6%

Rank #7

openai/gpt-4o-2024-11-20

10.4% no89.6% yes

Human gap30.2%

Rank #67

allenai/molmo-2-8b

10.0% no90.0% yes

Human gap30.6%

Rank #6

openrouter/healer-alpha

8.0% no92.0% yes

Human gap32.6%

Rank #10

amazon/nova-lite-v1

7.0% no93.0% yes

Human gap33.6%

Rank #51

openai/gpt-5.4-nano

7.0% no93.0% yes

Human gap33.6%

Rank #31

openai/gpt-5.2

6.0% no94.0% yes

Human gap34.6%

Rank #43

amazon/nova-pro-v1

5.8% no94.2% yes

Human gap34.8%

Rank #73

google/gemini-3.1-pro-preview

5.0% no95.0% yes

Human gap35.6%

Rank #45

qwen/qwen3.5-flash-02-23

4.0% no96.0% yes

Human gap36.6%

Rank #9

openai/o3

3.9% no96.1% yes

Human gap36.7%

Rank #64

moonshotai/kimi-k2.5

3.0% no97.0% yes

Human gap37.6%

Rank #13

minimax/minimax-01

1.9% no98.0% yes

Human gap38.7%

Rank #72

amazon/nova-2-lite-v1

1.0% no99.0% yes

Human gap39.6%

Rank #60

google/gemini-3.1-flash-lite-preview

1.0% no99.0% yes

Human gap39.6%

Rank #55

openai/gpt-4o

1.0% no99.0% yes

Human gap39.6%

Rank #15

qwen/qwen-2-vl-72b-instruct

1.0% no99.0% yes

Human gap39.6%

Rank #29

anthropic/claude-haiku-4.5

0.0% no100.0% yes

Human gap40.6%

Rank #52

anthropic/claude-opus-4.5

0.0% no100.0% yes

Human gap40.6%

Rank #56

anthropic/claude-opus-4.6

0.0% no100.0% yes

Human gap40.6%

Rank #63

anthropic/claude-opus-4.7

0.0% no100.0% yes

Human gap40.6%

Rank #38

anthropic/claude-opus-4.8

0.0% no100.0% yes

Human gap40.6%

Rank #40

anthropic/claude-sonnet-4.6

0.0% no100.0% yes

Human gap40.6%

Rank #62

baidu/ernie-4.5-vl-28b-a3b

0.0% no100.0% yes

Human gap40.6%

Rank #69

bytedance-seed/seed-1.6

0.0% no100.0% yes

Human gap40.6%

Rank #41

bytedance-seed/seed-2.0-lite

0.0% no100.0% yes

Human gap40.6%

Rank #14

bytedance-seed/seed-2.0-mini

0.0% no100.0% yes

Human gap40.6%

Rank #19

google/gemini-2.5-flash

0.0% no100.0% yes

Human gap40.6%

Rank #21

google/gemini-2.5-flash-lite

0.0% no100.0% yes

Human gap40.6%

Rank #54

google/gemini-2.5-pro

0.0% no100.0% yes

Human gap40.6%

Rank #25

google/gemma-3-12b-it

0.0% no100.0% yes

Human gap40.6%

Rank #26

google/gemma-3-27b-it

0.0% no100.0% yes

Human gap40.6%

Rank #48

meta-llama/llama-4-maverick

0.0% no100.0% yes

Human gap40.6%

Rank #68

meta-llama/llama-4-scout

0.0% no100.0% yes

Human gap40.6%

Rank #33

mistralai/mistral-large-2512

0.0% no100.0% yes

Human gap40.6%

Rank #71

mistralai/pixtral-large-2411

0.0% no100.0% yes

Human gap40.6%

Rank #50

openai/gpt-4.1

0.0% no100.0% yes

Human gap40.6%

Rank #74

openai/gpt-4.1-nano

0.0% no100.0% yes

Human gap40.6%

Rank #36

openai/gpt-4o-mini

0.0% no100.0% yes

Human gap40.6%

Rank #61

openai/gpt-5.1

0.0% no100.0% yes

Human gap40.6%

Rank #49

openai/gpt-5.1-chat

0.0% no100.0% yes

Human gap40.6%

Rank #8

openai/gpt-5.1-codex

0.0% no100.0% yes

Human gap40.6%

Rank #37

openai/gpt-5.4

0.0% no100.0% yes

Human gap40.6%

Rank #59

openai/o1

0.0% no100.0% yes

Human gap40.6%

Rank #2

openai/o1-pro

0.0% no100.0% yes

Human gap40.6%

Rank #1

openai/o3-pro

0.0% no100.0% yes

Human gap40.6%

Rank #53

perplexity/sonar-pro-search

0.0% no100.0% yes

Human gap40.6%

Rank #32

qwen/qwen2.5-vl-32b-instruct

0.0% no100.0% yes

Human gap40.6%

Rank #39

qwen/qwen2.5-vl-72b-instruct

0.0% no100.0% yes

Human gap40.6%

Rank #70

qwen/qwen3-vl-235b-a22b-instruct

0.0% no100.0% yes

Human gap40.6%

Rank #47

qwen/qwen3-vl-30b-a3b-instruct

0.0% no100.0% yes

Human gap40.6%

Rank #66

qwen/qwen3.5-122b-a10b

0.0% no100.0% yes

Human gap40.6%

Rank #11

qwen/qwen3.5-27b

0.0% no100.0% yes

Human gap40.6%

Rank #18

qwen/qwen3.5-35b-a3b

0.0% no100.0% yes

Human gap40.6%

Rank #23

qwen/qwen3.5-397b-a17b

0.0% no100.0% yes

Human gap40.6%

Rank #34

qwen/qwen3.5-9b

0.0% no100.0% yes

Human gap40.6%

Rank #27

qwen/qwen3.5-plus-02-15

0.0% no100.0% yes

Human gap40.6%

Rank #35

x-ai/grok-4.20-beta

0.0% no100.0% yes

Human gap40.6%

Rank #17

z-ai/glm-4.6v

0.0% no100.0% yes

Human gap40.6%

Rank #58

Selected human comments

I just want you to know that i really think this is a important sawnwhich
That is just art
No, it is a photo of a hash brown sandwich.

openai/gpt-5.5 comments

It’s burger-like, but the outer layers appear to be hash browns rather than bread, so I wouldn’t classify it as a sandwich.
It has stacked fillings like a burger, but the outer layers appear to be hash browns rather than bread, so I’d classify it as a sandwich-like stack, not a sandwich.

openai/gpt-4.1 comments

Though the 'bread' appears to be hash browns, the structure—a filling between two separate pieces—fits the broad definition of a sandwich.
Though the 'bread' is made of hash browns, it encloses fillings between two flat sides, fitting a broad definition of a sandwich.

Vote card

Hashbrown Sandwich

How this photo split the room

How Models Align with Human Responses

Selected human comments

openai/gpt-5.5 comments

openai/gpt-4.1 comments

Generated summary for this photo