The AI Race Accelerates: OpenAI Launches GPT-4o

May 14, 2024

OpenAI has once again set a new standard in the AI landscape with the release of GPT-4o on May 13, 2024, shortly after Meta’s ambitious Llama 3.0. This launch reaffirms that in the AI race, there is no room for laggards. From the perspective of an entrepreneur, product owner, or engineering leader, GPT-4o has four significant implications for AI-based products:

‍

1. Boost in Quality

‍

The new model performs significantly better than the best models currently out there. This gives a free boost to products which currently use a LLM at the back to write code or perform analysis.

GPT-40 is at par with the Turbo version of GPT-4 concerning performance on text, and reasoning. Coding intelligence is highly updated. With regards audio and vision capabilities and multilingual deliberations - there is a significant improvement due to native multi modality.

The model has scored a new high score of 88.7% on MMLU about general knowledge questions which in turn has set a new benchmark in reasoning capabilities of AI models.

Regarding Audio translation and Speech recognition, the model has far outperformed OpenAI’s own Whisper-v3.

‍

‍

An ELO graph from lmsys illustrates nearly a 100-point jump in the ELO score for GPT-4o, highlighting its superior performance.

‍

2. Lower Token Cost

The token cost for GPT-4o is 50% lower, significantly reducing the operational expenses (OPEX) for AI products. For use cases without large predictable workloads, it was already cheaper to consume OpenAI APIs. Now, it’s even more cost-effective. Despite this, the hope remains that the open-source model community will continue to make rapid progress.

‍

3. Faster Inference

Faster inference leads to a better user experience. Similar to reduced OPEX and quality improvements, providing prompt responses to customers significantly enhances product quality. GPT-4o can respond to audio inputs in as little as 232 milliseconds, with an average response time of 320 milliseconds, closely mimicking human timing.

‍

4. Multimodal - Giving Rise to Entirely New Use Cases

Native multimodal support enables the development of entirely new product categories. While previous GPT versions improved existing products by making them better, cheaper, and faster, GPT-4o opens the door to new possibilities.

Previously, OpenAI models used separate models for transcribing input audio to text, processing the text through the GPT engine, and translating the output text back to audio. This process caused the GPT engine to miss related information such as tone, multiple speakers, or background noises. It also couldn’t emote, laugh, or sing at the output end. Now, similar to Gemini, GPT-4o is natively multimodal, overcoming these limitations.

Moreover, making this state-of-the-art model available for free in ChatGPT will drive broader awareness of AI's capabilities, which were previously underestimated by users of the free version of ChatGPT. OpenAI’s release of GPT-4o in the free tier is a bright spot, potentially expanding the boundaries of AI applications and possibilities.

A wave of new products built on GPT-4o is on the horizon. If you want to explore how these improvements can impact your product or business, schedule a free brainstorming session with us.

Let’s build the future together!

‍