OpenAI launches all-singing and talking, but not yet dancing, GPT-4o

GPT-4o large language model shows off fast new text, audio and image capabilities

Technology / news

GPT-4o large language model shows off fast new text, audio and image capabilities

14th May 24, 12:47pm by Juha Saarinen

Source: OpenAI

Microsoft-backed artificial intelligence company OpenAI has released GPT-4o, with demonstrations showing how the artificial intelligence (AI) model interacts with users through text, audio and images in near real-time.

The o in the product name stands for omni, and the generative pretrained model is able to respond to audio input with similar response times to humans. OpenAI says this is 320 milliseconds on average.

This is much faster than the earlier GPT-3.5 and GPT-4, which have on average delays of 2.8 and 5.4 seconds when users talk to them with Voice Mode.

OpenAI said the speed gain is due to moving to a single model with GPT-4o, with all inputs being processed on the same neural network. Prior to GPT-4o, OpenAI used a three-model pipeline with audio-to-text conversion, GPT 3.5/4 outputting text and a final model turning it into sound again.

That last process meant a great deal of information was lost, with the AI not being able to discern tone, multiple speakers, background noises and couldn't output emotion, making it less realistic to use.

The new GPT-4o model is OpenAI's new flagship product, with the company saying it can do pretty much everything the earlier top-of-the-line product, GPT-4 Turbo, can.

Certainly, OpenAI's marketing demos of GPT-4o point to some astonishing capabilities, as in this clip from the company's Greg Brockman:

Introducing GPT-4o, our new model which can reason across text, audio, and video in real time.

It's extremely versatile, fun to play with, and is a step towards a much more natural form of human-computer interaction (and even human-computer-computer interaction): pic.twitter.com/VLG7TJ1JQx
— Greg Brockman (@gdb) May 13, 2024

Free version of ChatGPT upgrades

OpenAI said it will roll out GPT-4o to users of ChatGPT Free, but didn't say when. The first to get GPT-4o are customers of ChatGPT Plus and Team, and after that, Enterprise version users.

Currently, ChatGPT Free users get the older GPT-3.5 model; using GPT-4 or GPT-4 Turbo requires a monthly subscription.

When GPT-4o comes to ChatGPT Free, users will apart from getting GPT-4 AI have the ability to pull in responses not just from the model itself (which has a training data cut-off date of October 2023) but the web too.

ChatGPT being able to browse the web should help with the response accuracy for the AI.

The AI vendor also said ChatGPT Free users will be able to access GPT-4o features like data and chart analysis, upload files, and have the AI remember conversations among other things.

There's no free lunch however. OpenAI said the free version of ChatGPT will come with usage limits and switch down to GPT-3.5 when those are reached.

Nevertheless, GPT-4o coming to ChatGPT Free suggests OpenAI has managed to lower the resource cost of using the model, compared to its prior LLMs.

ChatGPT desktop app for macOS

Accessing the different GPTs have been through a web interface, or via code through OpenAIs application programming interfaces until now. The company has released a new alpha (very early code that's likely to be buggy) version of ChatGPT for Apple's macOS.

This supports voice conversations, and will handle GPT-4o's audio and video capabilities in the future, OpenAI said.

Subscribers to OpenAI's Plus plan will get the macOS ChatGPT app first, and it'll become more broadly available over the coming weeks with the AI vendor working on a Windows version as well.

We welcome your comments below. If you are not already registered, please register to comment.

Remember we welcome robust, respectful and insightful debate. We don't welcome abusive or defamatory comments and will de-register those repeatedly making such comments. Our current comment policy is here.

3 Comments

by Hamish | 14th May 24, 1:01pm 1715648472

So you can plagiarize even quicker now?

https://www.businessinsider.com/openai-destroyed-ai-training-datasets-l…

by ShoreThing | 14th May 24, 4:59pm 1715662753

ChatGPT being able to browse the web should help with the response accuracy for the AI.

Or, you know, get owned by malicious prompt injection.

by beau | 15th May 24, 7:26am 1715714767

Natural, real-time language translation. What an incredible gift.