Was this helpful?
Thumbs UP Thumbs Down

OpenAI launches model built on Cerebras chip technology

OpenAI logo displayed on a laptop.
OpenAI headquarter

Meet the dinner plate chip

You know those small square chips inside your gadgets? This isn’t one of them. OpenAI just started using a chip from a company called Cerebras that is literally the size of a dinner plate. It’s massive.

Instead of cutting up a silicon wafer into hundreds of tiny pieces, Cerebras keeps it whole to make one gigantic brain. This huge size lets it move data incredibly fast, which means your commands get answered in the blink of an eye rather than after a long pause.

Programmer writing code on laptop

It’s all about the speed

Why does size matter for you? It’s all about speed. The new model, called GPT-5.3 Codex Spark, can churn out over 1,000 tokens per second. A token is usually a word or a piece of code. To put that in plain English, it feels like talking to a friend who finishes your sentences instantly, rather than one who stares at the ceiling for ten seconds thinking.

For people writing code, that speed keeps them in the zone without annoying interruptions. Benchmarks from Cerebras and its cloud partners show similar OpenAI models running at around 3,000 tokens per second, making them up to several times faster than many GPU-based setups.

OpenAI logo displayed on phone screen

Built for the oops, let me fix that moment

Have you ever told your smart speaker to do something and then immediately yelled, “Wait, no. Normal AI hates that.” It usually has to finish its slow thought process before listening to you again. This new model is different. It’s designed to be interrupted.

If a coder changes their mind halfway through a task, they can stop the AI mid-sentence and send it in a new direction. It makes working with AI feel less like giving orders to a robot and more like brainstorming with a buddy. OpenAI specifically tuned it for this back-and-forth style of collaboration.

OpenAI logo displayed on a laptop.

A special tool, not a superweapon

This new Spark model isn’t trying to be the smartest kid in class. It’s actually a lighter, faster version of OpenAI’s super-smart coding model. Think of it like a race car versus a dump truck. The race car (Spark) is perfect for quick edits, running small tests, and fixing tiny bits of code right now.

The dump truck is for the heavy lifting. OpenAI knows that sometimes you need raw speed over raw power, so they built a tool just for that job. It defaults to making only small, targeted changes unless you ask for more.

Little-known fact: This single chip packs over 4 trillion transistors and 900,000 AI-optimized cores. For comparison, a high-end gaming PC chip might have around 20 billion transistors. You’d need about 200 of those just to match one Cerebras chip.

Man using laptop

Saying goodbye to the waiting game

If you’ve ever used a coding assistant, you know the drill: you ask it to fix a line of code, and then you stare at the screen waiting for it to stop thinking. That lag is called latency. OpenAI says they cut that waiting time in half with this new setup.

By using the Cerebras chips, they’ve removed the traffic jams that happen when data has to travel too far inside the computer. It’s like taking a car off a crowded highway and putting it on a clear, private racetrack. The chip has a whopping 44 gigabytes of super-fast memory built right onto it.

Little-known fact: OpenAI CEO Sam Altman has been personally invested in Cerebras since its early days, long before this partnership went public. The two companies have been quietly talking about working together since 2017.

Nvidia headquarter

Why OpenAI is ditching Nvidia

For years, Nvidia has been the king of AI chips. They make the brains behind almost every smart tool out there. But OpenAI is now looking for other friends to play with. They’re not kicking Nvidia to the curb; they still use them for the really big stuff.

But they’re adding new suppliers like Cerebras, so they aren’t putting all their eggs in one basket. It’s smart shopping; if one store is busy, you go to another. OpenAI says GPUs are still “foundational” for training their biggest models, but Cerebras excels at instant responses.

Engineer in rubber gloves holding computer microchip.

The secret weapon, on chip memory

Here’s the geeky secret to the speed: it’s all about where the memory lives. Normal chips have to go grab information from faraway memory banks, which takes time. Cerebras packs the memory right onto the same giant chip.

It’s like having your tools stored on your workbench instead of locked in a shed in the backyard. This on-chip memory is roughly 1,000 times faster than the usual stuff, which is why the AI can answer you so quickly. The chip delivers 21 petabytes per second of memory bandwidth, which is about 1,000 times more than Nvidia’s upcoming Rubin GPU.

Close up shot of dollar

A $10 billion friendship

This partnership isn’t a cheap trial run. Last month, OpenAI signed a deal worth more than $10 billion to use Cerebras hardware. That’s a serious commitment to making their chatbots faster. They plan to roll out a massive amount of this new computing power in stages over the next few years.

By 2028, they want a huge system up and running, dedicated to making sure their AI responds to you without those annoying spinning wheels. The deal covers up to 750 megawatts of computing power, making it the largest high-speed AI inference deployment ever announced.

Man interacted with artificial intelligence

Not just faster, but smarter about edits

One annoying thing about AI coders is that they love to rewrite your whole page just to fix one typo. It’s like hiring a painter to touch up a wall and coming home to find they repainted the whole house. The new Spark model defaults to making only small, targeted changes.

Unless you specifically ask it to run a test or rewrite a big chunk, it just fixes the tiny problem you pointed out. It respects your work and doesn’t try to take over the whole project. This lightweight approach keeps things fast and predictable.

Google logo displayed

The competition is heating up

OpenAI isn’t the only game in town anymore. Rivals like Google and Anthropic are racing to win over programmers with their own fast coding modes and agent tools, while running high-profile marketing campaigns to build brand loyalty.

If you’re a developer, every second counts. Choosing the tool that feels instant instead of sluggish is a no-brainer, so this speed boost is a major weapon in that battle. ChatGPT now has over 800 million weekly users, but competitors are gaining ground fast.

Bug fix concept.

What happens in a second?

So, what can you actually do with this speed? Imagine asking the AI to check a specific line of code for bugs, fix it, and then test it, all while you take a sip of coffee. Because the model can process over 1,000 tokens per second, these tiny tasks feel like magic.

It creates a flow where you don’t have to stop and wait, which makes building software feel more like having a conversation and less like filling out paperwork. OpenAI also reports cutting network and API overhead by about 80% with a persistent WebSocket path and a slimmer inference stack, so the whole request pipeline is faster, not just the chip.

Little-known fact: At 1,000 tokens per second, this model could read an entire short novel in about one minute. That’s faster than most humans can flip the pages.

Open AI logo displayed on a phone

The future is a blur of speed

OpenAI has a bigger dream than just fixing code faster. They want to build an assistant that can juggle multiple things at once. Imagine asking it to fix one bug while also researching a totally different problem in the background.

The new fast chip is the first step. It handles the quick back-and-forth chatter, while slower, smarter models can work behind the scenes on the tough stuff. Eventually, you won’t even notice the difference; it will all just happen. The goal is to blend instant responses with deep, thoughtful reasoning seamlessly.

If you’re curious how that bigger vision is already taking shape, take a look at how OpenAI is introducing age prediction tech to ChatGPT.

ChatGPT logo displayed

Why this matters for the rest of us

Even if you never write a line of code, this rollout is an early preview of where chatbots are headed. The same low-latency tricks that make Codex-Spark feel instant could eventually show up in consumer chatbots, customer service agents, and virtual assistants.

If you’ve ever yelled at your phone because it was too slow, this is the fix. OpenAI is paving the way for a world where AI doesn’t make you wait. It responds faster than you can think, which is the whole point of having a smart assistant in the first place.

And if you want to see how the competition is heating up around all this progress, take a look at how the clash between OpenAI and Elon Musk just got bigger.

So, would you use a chatbot this fast, or do you prefer to take things slow? Drop a comment and let me know, and if you enjoyed the read, hit that like button.

This slideshow was made with AI assistance and human editing.

Don’t forget to follow us for more exclusive content on MSN.

Read More From This Brand:

This content is exclusive for our subscribers.

Get instant FREE access to ALL of our articles.

Was this helpful?
Thumbs UP Thumbs Down
Prev Next
Share this post

Lucky you! This thread is empty,
which means you've got dibs on the first comment.
Go for it!

Send feedback to ComputerUser



    We appreciate you taking the time to share your feedback about this page with us.

    Whether it's praise for something good, or ideas to improve something that isn't quite right, we're excited to hear from you.