Model breakdown
GPT / OpenAIGPT-4o (Spring 2024)
Profile
GPT-4oOpenAI
Release
2024-05-13Historic run; not on OpenRouter
Specs
Not publicly disclosed128,000 tokens
Capabilities
Text + ImageGeneral-purpose multimodal work
Training
October 01, 2023OpenAI
Rank#4
-269.3alignment score
77.9%crowd match
Mean gap22.1%
Human match77.9%
Best fitBacon Lettuce Tomato
Average vote62.9%
62.9%model yes
62.8%human yes
Workload1K evals
1Kevals
50iterations
~838.8Ktokens
Photo-by-photo

Model Results

Breaking down how close the model answered each question, compared to humans.

Dodge Van
Photo 02Dodge Van
GPT / OpenAIGPT-4o (Spring 2024)
0.0% yes100.0% no
Gap7.0%
Model readLeans no
Sub Sandwich
Photo 03Sub Sandwich
GPT / OpenAIGPT-4o (Spring 2024)
100.0% yes0.0% no
Gap5.5%
Model readLeans yes
GPT / OpenAIGPT-4o (Spring 2024)
0.0% yes100.0% no
Gap40.9%
Model readLeans no
GPT / OpenAIGPT-4o (Spring 2024)
100.0% yes0.0% no
Gap4.4%
Model readLeans yes
GPT / OpenAIGPT-4o (Spring 2024)
0.0% yes100.0% no
Gap54.2%
Model readLeans no
Hamburger
Photo 08Hamburger
GPT / OpenAIGPT-4o (Spring 2024)
100.0% yes0.0% no
Gap27.0%
Model readLeans yes
GPT / OpenAIGPT-4o (Spring 2024)
84.0% yes16.0% no
Gap24.6%
Model readLeans yes
Hot Dog
Photo 10Hot Dog
GPT / OpenAIGPT-4o (Spring 2024)
82.0% yes18.0% no
Gap42.2%
Model readLeans yes
GPT / OpenAIGPT-4o (Spring 2024)
26.0% yes74.0% no
Gap39.6%
Model readLeans no
Avocado Tea
Photo 12Avocado Tea
GPT / OpenAIGPT-4o (Spring 2024)
100.0% yes0.0% no
Gap7.2%
Model readLeans yes
Panini
Photo 13Panini
GPT / OpenAIGPT-4o (Spring 2024)
100.0% yes0.0% no
Gap7.6%
Model readLeans yes
Cookie PB
Photo 14Cookie PB
GPT / OpenAIGPT-4o (Spring 2024)
46.0% yes54.0% no
Gap5.5%
Model readLeans no
Chicken Wrap
Photo 15Chicken Wrap
GPT / OpenAIGPT-4o (Spring 2024)
0.0% yes100.0% no
Gap22.6%
Model readLeans no
GPT / OpenAIGPT-4o (Spring 2024)
100.0% yes0.0% no
Gap33.7%
Model readLeans yes
Sloppy Joe
Photo 17Sloppy Joe
GPT / OpenAIGPT-4o (Spring 2024)
100.0% yes0.0% no
Gap20.6%
Model readLeans yes
GPT / OpenAIGPT-4o (Spring 2024)
36.0% yes64.0% no
Gap19.7%
Model readLeans no
Bagel PB&J
Photo 20Bagel PB&J
GPT / OpenAIGPT-4o (Spring 2024)
84.0% yes16.0% no
Gap37.4%
Model readLeans yes
PhotoVote SplitHuman responseGapRead
GPT / OpenAIGPT-4o (Spring 2024)
100.0% yes0.0% no
3.7%absolute gap
Leans yesPeople mostly said yes
Dodge Van
Photo 02Dodge Van
GPT / OpenAIGPT-4o (Spring 2024)
0.0% yes100.0% no
7.0%absolute gap
Leans noPeople mostly said no
Sub Sandwich
Photo 03Sub Sandwich
GPT / OpenAIGPT-4o (Spring 2024)
100.0% yes0.0% no
5.5%absolute gap
Leans yesPeople mostly said yes
GPT / OpenAIGPT-4o (Spring 2024)
0.0% yes100.0% no
40.9%absolute gap
Leans noHuman knife-edge
GPT / OpenAIGPT-4o (Spring 2024)
100.0% yes0.0% no
4.4%absolute gap
Leans yesPeople mostly said yes
GPT / OpenAIGPT-4o (Spring 2024)
100.0% yes0.0% no
8.3%absolute gap
Leans yesPeople mostly said yes
GPT / OpenAIGPT-4o (Spring 2024)
0.0% yes100.0% no
54.2%absolute gap
Leans noHuman knife-edge
Hamburger
Photo 08Hamburger
GPT / OpenAIGPT-4o (Spring 2024)
100.0% yes0.0% no
27.0%absolute gap
Leans yesSplit concept
GPT / OpenAIGPT-4o (Spring 2024)
84.0% yes16.0% no
24.6%absolute gap
Leans yesHuman knife-edge
Hot Dog
Photo 10Hot Dog
GPT / OpenAIGPT-4o (Spring 2024)
82.0% yes18.0% no
42.2%absolute gap
Leans yesSplit concept
GPT / OpenAIGPT-4o (Spring 2024)
26.0% yes74.0% no
39.6%absolute gap
Leans noSplit concept
Avocado Tea
Photo 12Avocado Tea
GPT / OpenAIGPT-4o (Spring 2024)
100.0% yes0.0% no
7.2%absolute gap
Leans yesPeople mostly said yes
Panini
Photo 13Panini
GPT / OpenAIGPT-4o (Spring 2024)
100.0% yes0.0% no
7.6%absolute gap
Leans yesPeople mostly said yes
Cookie PB
Photo 14Cookie PB
GPT / OpenAIGPT-4o (Spring 2024)
46.0% yes54.0% no
5.5%absolute gap
Leans noHuman knife-edge
Chicken Wrap
Photo 15Chicken Wrap
GPT / OpenAIGPT-4o (Spring 2024)
0.0% yes100.0% no
22.6%absolute gap
Leans noSplit concept
GPT / OpenAIGPT-4o (Spring 2024)
100.0% yes0.0% no
33.7%absolute gap
Leans yesSplit concept
Sloppy Joe
Photo 17Sloppy Joe
GPT / OpenAIGPT-4o (Spring 2024)
100.0% yes0.0% no
20.6%absolute gap
Leans yesSplit concept
GPT / OpenAIGPT-4o (Spring 2024)
0.0% yes100.0% no
29.8%absolute gap
Leans noSplit concept
GPT / OpenAIGPT-4o (Spring 2024)
36.0% yes64.0% no
19.7%absolute gap
Leans noHuman knife-edge
Bagel PB&J
Photo 20Bagel PB&J
GPT / OpenAIGPT-4o (Spring 2024)
84.0% yes16.0% no
37.4%absolute gap
Leans yesHuman knife-edge
GPT-4o (Spring 2024) Sandwich Benchmark Breakdown | opensandwich.ai