Back
Hello GPT-4o | OpenAI
webCredibility Rating
4/5
High(4)High quality. Established institution or organization with editorial oversight and accountability.
Rating inherited from publication venue: OpenAI
Official OpenAI announcement of GPT-4o; relevant to AI safety researchers tracking frontier model capabilities, deployment practices, and how safety evaluations are communicated during major model releases.
Metadata
Importance: 55/100press releaseprimary source
Summary
OpenAI announces GPT-4o, a new flagship model capable of processing and generating text, audio, and images in an integrated, real-time manner. GPT-4o matches GPT-4 Turbo on text and code tasks while significantly improving vision and audio capabilities, and is faster and more efficient. It represents a step toward more natural human-computer interaction with end-to-end multimodal processing.
Key Points
- •GPT-4o is OpenAI's first omnimodal model, natively processing text, audio, and images rather than stitching together separate models.
- •It achieves real-time conversational audio responses (avg 320ms latency), much closer to human response times than previous voice pipelines.
- •GPT-4o matches GPT-4 Turbo performance on English text and code benchmarks while being faster and cheaper via API.
- •The model shows improved multilingual, vision, and audio understanding, with new capabilities like detecting emotion from voice and describing visual scenes.
- •OpenAI plans to roll out GPT-4o iteratively, with safety evaluations ongoing for the full audio/vision capabilities before wider release.
Cited by 1 page
| Page | Type | Quality |
|---|---|---|
| AI-Driven Concentration of Power | Risk | 65.0 |
Cached Content Preview
HTTP 200Fetched Apr 9, 202616 KB
Hello GPT-4o | OpenAI
Mar
APR
May
01
2025
2026
2027
success
fail
About this capture
COLLECTED BY
Collection: Save Page Now Outlinks
TIMESTAMPS
The Wayback Machine - http://web.archive.org/web/20260401140356/https://openai.com/index/hello-gpt-4o/
Skip to main content
li:hover)>li:not(:hover)>*]:text-primary-60 flex h-full min-w-0 items-baseline gap-0 overflow-x-hidden whitespace-nowrap [-ms-overflow-style:none] [scrollbar-width:none] focus-within:overflow-visible [&::-webkit-scrollbar]:hidden">
Research
Products
Business
Developers
Company
Foundation(opens in a new window)
Log in
Try ChatGPT
(opens in a new window)
Research
Products
Business
Developers
Company
Foundation
(opens in a new window)
Try ChatGPT
(opens in a new window)Login
OpenAI
Table of contents
Explorations of capabilities
Model evaluations
ChatGPT-4o Risk Scorecard
May 13, 2024
Milestone
Hello GPT‑4o
We’re announcing GPT‑4o, our new flagship model that can reason across audio, vision, and text in real time.
Contributions
Try on ChatGPT
(opens in a new window)GPT-4o System Card
All videos on this page are at 1x real time.
p]:text-caption prose [&>p]:text-primary-100 my-0! max-w-none [&>p]:my-0">
Guessing May 13th’s announcement.
More Resources
Try in Playground
(opens in a new window)Rewatch live demos
Loading…
Share
GPT‑4o (“o” for “omni”) is a step towards much more natural human-computer interaction—it accepts as input any combination of text, audio, image, and video and generates any combination of text, audio, and image outputs. It can respond to audio inputs in as little as 232 milliseconds, with an average of 320 milliseconds, which is similar to human response time(opens in a new window) in a conversation. It matches GPT‑4 Turbo performance on text in English and code, with significant improvement on text in non-English languages, while also being much faster and 50% cheaper in the API. GPT‑4o is especially better at vision and audio understanding compared to existing models.
Model capabilities
svg]:opacity-60 CarouselNav_navButton__NfEzo" aria-label="Previous slide">
svg]:opacity-60 CarouselNav_navButton__NfEzo" aria-label="Next slide">
p]:text-caption prose [&>p]:text-primary-100 my-0! max-w-none [&>p]:my-0">
Two GPT‑4os interacting and singing.
p]:text-caption prose [&>p]:text-primary-100 my-0! max-w-none [&>p]:my-0">
Interview prep.
p]:text-caption prose [&>p]:text-primary-100 my-0! max-w-none [&>p]:my-0">
Rock Paper Scissors.
p]:text-caption prose [&>p]:text-primary-100 my-0! max-w-none [&>p]:my-0">
Sarcasm.
p]:text-caption prose [&>p]:text-primary-100 my-0! max-w-none [&>p]:my-0">
Math with Sal and Imran Khan.
p]:text-caption prose [&>p]:text-primary-100 my-0! max-w-none [&>p]:my-0">
Two GPT‑4os harmonizing.
p]:text-caption prose [&>p]:tex
... (truncated, 16 KB total)Resource ID:
ee605bab036068f0 | Stable ID: YTMxMDE5Mm