Skip to content
Longterm Wiki
Back

Hello GPT-4o | OpenAI

web

Credibility Rating

4/5
High(4)

High quality. Established institution or organization with editorial oversight and accountability.

Rating inherited from publication venue: OpenAI

Official OpenAI announcement of GPT-4o; relevant to AI safety researchers tracking frontier model capabilities, deployment practices, and how safety evaluations are communicated during major model releases.

Metadata

Importance: 55/100press releaseprimary source

Summary

OpenAI announces GPT-4o, a new flagship model capable of processing and generating text, audio, and images in an integrated, real-time manner. GPT-4o matches GPT-4 Turbo on text and code tasks while significantly improving vision and audio capabilities, and is faster and more efficient. It represents a step toward more natural human-computer interaction with end-to-end multimodal processing.

Key Points

  • GPT-4o is OpenAI's first omnimodal model, natively processing text, audio, and images rather than stitching together separate models.
  • It achieves real-time conversational audio responses (avg 320ms latency), much closer to human response times than previous voice pipelines.
  • GPT-4o matches GPT-4 Turbo performance on English text and code benchmarks while being faster and cheaper via API.
  • The model shows improved multilingual, vision, and audio understanding, with new capabilities like detecting emotion from voice and describing visual scenes.
  • OpenAI plans to roll out GPT-4o iteratively, with safety evaluations ongoing for the full audio/vision capabilities before wider release.

Cited by 1 page

PageTypeQuality
AI-Driven Concentration of PowerRisk65.0

Cached Content Preview

HTTP 200Fetched Apr 9, 202616 KB
Hello GPT-4o | OpenAI

 

 
 
 
 

 Mar
 APR
 May
 

 
 

 
 01
 
 

 
 

 2025
 2026
 2027
 

 
 
 

 

 

 
 
success

 
fail

 
 
 
 
 
 
 
 
 
 
 

 

 
 
 
 
 
 
 
 
 

 

 About this capture
 

 

 

 

 

 

 
COLLECTED BY

 

 

 
 
Collection: Save Page Now Outlinks

 

 

 

 

 
TIMESTAMPS

 

 

 

 

 

 

The Wayback Machine - http://web.archive.org/web/20260401140356/https://openai.com/index/hello-gpt-4o/

 

Skip to main content

li:hover)>li:not(:hover)>*]:text-primary-60 flex h-full min-w-0 items-baseline gap-0 overflow-x-hidden whitespace-nowrap [-ms-overflow-style:none] [scrollbar-width:none] focus-within:overflow-visible [&::-webkit-scrollbar]:hidden">
Research

Products

Business

Developers

Company

Foundation(opens in a new window)

Log in

Try ChatGPT

(opens in a new window)

Research

Products

Business

Developers

Company

Foundation

(opens in a new window)

Try ChatGPT

(opens in a new window)Login

OpenAI

Table of contents

Explorations of capabilities

Model evaluations

ChatGPT-4o Risk Scorecard

May 13, 2024
Milestone

Hello GPT‑4o

We’re announcing GPT‑4o, our new flagship model that can reason across audio, vision, and text in real time.

Contributions

Try on ChatGPT 

(opens in a new window)GPT-4o System Card

All videos on this page are at 1x real time.

p]:text-caption prose [&>p]:text-primary-100 my-0! max-w-none [&>p]:my-0">
Guessing May 13th’s announcement.

More Resources

Try in Playground 

(opens in a new window)Rewatch live demos

Loading…

Share

GPT‑4o (“o” for “omni”) is a step towards much more natural human-computer interaction—it accepts as input any combination of text, audio, image, and video and generates any combination of text, audio, and image outputs. It can respond to audio inputs in as little as 232 milliseconds, with an average of 320 milliseconds, which is similar to human response time⁠(opens in a new window) in a conversation. It matches GPT‑4 Turbo performance on text in English and code, with significant improvement on text in non-English languages, while also being much faster and 50% cheaper in the API. GPT‑4o is especially better at vision and audio understanding compared to existing models.

Model capabilities

svg]:opacity-60 CarouselNav_navButton__NfEzo" aria-label="Previous slide">

svg]:opacity-60 CarouselNav_navButton__NfEzo" aria-label="Next slide">

p]:text-caption prose [&>p]:text-primary-100 my-0! max-w-none [&>p]:my-0">
Two GPT‑4os interacting and singing.

p]:text-caption prose [&>p]:text-primary-100 my-0! max-w-none [&>p]:my-0">
Interview prep. 

p]:text-caption prose [&>p]:text-primary-100 my-0! max-w-none [&>p]:my-0">
Rock Paper Scissors.

p]:text-caption prose [&>p]:text-primary-100 my-0! max-w-none [&>p]:my-0">
Sarcasm.

p]:text-caption prose [&>p]:text-primary-100 my-0! max-w-none [&>p]:my-0">
Math with Sal and Imran Khan.

p]:text-caption prose [&>p]:text-primary-100 my-0! max-w-none [&>p]:my-0">
Two GPT‑4os harmonizing. 

p]:text-caption prose [&>p]:tex

... (truncated, 16 KB total)
Resource ID: ee605bab036068f0 | Stable ID: YTMxMDE5Mm