
OpenAI Releases AI Voice Models That Translate and Transcribe in Real-Time
In Focus
- OpenAI released three voice models in its API
- The new AI voice tools can be used in education, business, and media
- Users can access the voice models through its Realtime API
- Pricing for Translate and Whisper is based on the number of minutes used
OpenAI has released new voice models in its API that enable developers to build apps that can reason, transcribe, and translate conversations. The audio models are designed to create voice experiences that respond more intelligently, feel natural and act in real time.
How Do OpenAI’s Realtime Voice APIs Work?
OpenAI launched three voice models on May 7, 2026. The first model, GPT-Realtime-2, comes with realistic vocal simulation capabilities. Built with GPT‑5‑class reasoning capable of handling difficult requests, the OpenAI realtime voice API can converse with users.
The second voice intelligence API is GPT‑Realtime‑ Translate, which supports real-time translation that aligns to the conversation pace of a user. The new feature can comprehend over 70 input languages and relay outputs in 13 languages.
GPT-Realtime-Whisper is the third AI voice feature released by OpenAI. The model supports speech-to-text transcription, enabling users to convert words to text as speakers talk.
“Together, the models we are launching move real-time audio from simple call-and-response toward voice interfaces that can actually do work: listen, reason, translate, transcribe, and take action as a conversation unfolds,” OpenAI noted in a company statement.
New Voice APIs Have Multiple Use Cases
OpenAI, which is developing an AI-centric smartphone, said the new AI voice tools have a wide range of use cases, including media, education, events, and creator platforms. Enterprises expanding customer service functions will also find the tools useful.
“Voice is becoming one of the most natural ways for people to use software. It lets someone ask for help while driving, change a travel plan while walking through an airport, get support in their preferred language, or move through a task without stopping to type,” OpenAI added.
The AI firm identified three ways that developers use AI voice. One of these is voice-to-action, where users describe their desired output and the system completes the task. Systems-to-action is another way developers apply AI voice.
In this approach, applications convert context into voice, advising users on the next step. There is also voice-to-voice, where AI tools support live conversations across tasks, languages, and changing contexts.
OpenAI released the new AI voice models a day after it launched GPT-5.5 Instant and made it the default model on ChatGPT. The company said the new model can reference previous conversations.
How Much Is OpenAI Charging for the New Voice Models?
OpenAI has made the three voice models accessible through its Realtime API. The ChatGPT maker said pricing for Translate and Whisper is based on the number of minutes used. However, GPT-Realtime-2 pricing depends on token usage. While the new AI voice models offer clear benefits for businesses, they can be exploited in different ways.
OpenAI has taken several measures to ensure the features from being used to generate spam, scams, and other harmful online activity. The company said it has embedded specific triggers in the system so that “conversations can be halted if they are found to violate our harmful content guidelines.”

