sign up log in
Want to go ad-free? Find out how, here.

GPT-4o large language model shows off fast new text, audio and image capabilities

Technology / news
GPT-4o large language model shows off fast new text, audio and image capabilities
OpenAI GPT-4o
Source: OpenAI

Microsoft-backed artificial intelligence company OpenAI has released GPT-4o, with demonstrations showing how the artificial intelligence (AI) model interacts with users through text, audio and images in near real-time.

The o in the product name stands for omni, and the generative pretrained model is able to respond to audio input with similar response times to humans. OpenAI says this is 320 milliseconds on average.

This is much faster than the earlier GPT-3.5 and GPT-4, which have on average delays of 2.8 and 5.4 seconds when users talk to them with Voice Mode.

OpenAI said the speed gain is due to moving to a single model with GPT-4o, with all inputs being processed on the same neural network. Prior to GPT-4o, OpenAI used a three-model pipeline with audio-to-text conversion, GPT 3.5/4 outputting text and a final model turning it into sound again.

That last process meant a great deal of information was lost, with the AI not being able to discern tone, multiple speakers, background noises and couldn't output emotion, making it less realistic to use.

The new GPT-4o model is OpenAI's new flagship product, with the company saying it can do pretty much everything the earlier top-of-the-line product, GPT-4 Turbo, can.

Certainly, OpenAI's marketing demos of GPT-4o point to some astonishing capabilities, as in this clip from the company's Greg Brockman:

Free version of ChatGPT upgrades

OpenAI said it will roll out GPT-4o to users of ChatGPT Free, but didn't say when. The first to get GPT-4o are customers of ChatGPT Plus and Team, and after that, Enterprise version users.

Currently, ChatGPT Free users get the older GPT-3.5 model; using GPT-4 or GPT-4 Turbo requires a monthly subscription.

When GPT-4o comes to ChatGPT Free, users will apart from getting GPT-4 AI have the ability to pull in responses not just from the model itself (which has a training data cut-off date of October 2023) but the web too.

ChatGPT being able to browse the web should help with the response accuracy for the AI.

The AI vendor also said ChatGPT Free users will be able to access GPT-4o features like data and chart analysis, upload files, and have the AI remember conversations among other things.

There's no free lunch however. OpenAI said the free version of ChatGPT will come with usage limits and switch down to GPT-3.5 when those are reached.

Nevertheless, GPT-4o coming to ChatGPT Free suggests OpenAI has managed to lower the resource cost of using the model, compared to its prior LLMs. 

ChatGPT desktop app for macOS

Accessing the different GPTs have been through a web interface, or via code through OpenAIs application programming interfaces until now. The company has released a new alpha (very early code that's likely to be buggy) version of ChatGPT for Apple's macOS.

This supports voice conversations, and will handle GPT-4o's audio and video capabilities in the future, OpenAI said.

Subscribers to OpenAI's Plus plan will get the macOS ChatGPT app first, and it'll become more broadly available over the coming weeks with the AI vendor working on a Windows version as well.

We welcome your comments below. If you are not already registered, please register to comment.

Remember we welcome robust, respectful and insightful debate. We don't welcome abusive or defamatory comments and will de-register those repeatedly making such comments. Our current comment policy is here.



ChatGPT being able to browse the web should help with the response accuracy for the AI.

Or, you know, get owned by malicious prompt injection.


Natural, real-time language translation. What an incredible gift.