7 min read
7 min read

An AI coding competition has just named a winner, but the winning score was shockingly low. Out of all the test-takers, the top performer only got 7.5 percent correct. That’s not a typo, that’s the actual winning score.
This wasn’t a broken challenge or a rushed trial. It was a carefully crafted test meant to expose how AI handles real-world programming bugs. And for now, even the best tools are falling surprisingly short.

The top performer wasn’t a big name from Silicon Valley. It was Eduardo Rocha de Andrade, a Brazilian prompt engineer, who took home $50,000 in prize money. He earned it by solving just a tiny slice of the test.
It’s not that he failed, it’s that the test was brutally tough. Solving even a small percentage of the problems showed real skill under tight rules and limited computing resources.

Scoring under 10 percent may sound awful, but in this case, it proves how difficult the challenge really was. The test pushed AI to deal with new, unsolved coding problems.
In an age where people assume machines can do anything, the low score acts like a hard stop. It says we still have miles to go before AI can write reliable software without human help.

Andy Konwinski helped launch the challenge and is offering $1 million to the first open source model that can score above 90 percent. The cash isn’t just for fun; it’s a signal to take this seriously.
He wants developers around the world to push boundaries without needing huge servers or closed systems. The goal is to build smarter AI tools that are open, fair, and ready to work in real-world situations.

Some major AI labs didn’t take part in the opening round. That wasn’t a fluke; the rules made things harder for heavyweight models that rely on massive computing power.
The test runs offline using limited resources, leveling the field for smaller models. It’s not about size or fame. It’s about who can actually solve difficult code problems with efficient thinking and cleaner tools.

This test avoids something called benchmark contamination, which happens when models train on problems they’ll see again later. The K Prize changed that by using only fresh issues.
It relied on GitHub bugs posted after the model entry deadline. This keeps the test fair and honest. Models couldn’t memorize anything ahead of time, making each answer a genuine attempt at solving something new.

The K Prize is being compared to an older system called SWE-Bench. While SWE-Bench uses the same problems over and over, this new test changes everything each time.
That means no training beforehand, no shortcuts, and no pattern recognition. It’s about real understanding. Can a model look at a brand-new issue and come up with working code under pressure? That’s the big question.

The coding problems weren’t made-up examples or school-style puzzles. They were pulled from actual GitHub issues posted by real developers working on live projects.
That means the problems were unpredictable, sometimes messy, and very hard to fix. AI had to read through unclear code and broken functions and figure out how to help, just like a human programmer.

This wasn’t about copying and pasting pretty-looking code. It was about writing solutions that actually worked. The code had to fix the issue and do it without breaking anything else.
AI models had to think through the logic of the problem, understand the context, and deliver functional results. That’s a far cry from just generating something that looks correct at first glance.

Most people think AI is nearly superhuman by now. It can pass law school tests, write books, and make apps. So this low score was a surprising wake-up call.
It showed that AI struggles when faced with unfamiliar bugs. Even basic programming tasks, when taken from real life, can throw off today’s best tools. This test pulled back the curtain in a big way.

There’s been a lot of buzz around AI taking over professional roles like doctors, lawyers, and coders. But results like this show we’re still far from that world.
It’s not about killing the hype, it’s about keeping things honest. This challenge showed that today’s AI is still learning and far from mastering tasks we take for granted as simple.

Because the test didn’t require huge processing power, it allowed smaller models and independent teams to join. This was a key design choice to break tech industry patterns.
It opens the door for more innovation from unexpected places. You don’t need to work at a billion-dollar lab to compete; you just need smart ideas and the skill to build models that can think clearly.

Scoring 90 percent on this test isn’t just about bragging rights. It’s a way to show that an AI model can handle tough, unscripted coding tasks under real conditions.
The prize money makes it tempting, but the real reward is bigger. It would mean that someone built a tool that can truly help software teams, bug trackers, and even open source projects around the world.

This wasn’t a one-time contest. New rounds will keep happening, each with new issues and harder challenges. It’s a rolling experiment with more chances to learn.
The goal is to watch how AI coding tools improve. Can they adapt? Can developers learn from failure and build stronger models? That ongoing progress is what gives the K Prize lasting value.

This isn’t just tech trivia. These tests help shape the tools that might fix your favorite app or run your smart home. If an AI can’t pass, it shouldn’t code for you.
Building trust in AI starts by testing it honestly. The K Prize makes sure that the models helping us with code can actually do what they promise, no shortcuts or smoke and mirrors.
As AI continues to reshape how we work, learn, and stay competitive in the job market, it’s clear that mastering AI today can protect your career for years to come.

The low scores might look disappointing, but they give us a clear view of what needs to improve. This challenge could guide the future of AI development.
Every new round brings better models and deeper understanding. And that means better tools, stronger code, and maybe someday, AI that earns its place in your favorite software. The race is just beginning.
As the role of engineers evolves alongside the rise of AI-powered tools, adaptability will be key to staying relevant. For an insider’s perspective, see how GitHub CEO reveals the key to thriving as an engineer in the AI coding era.
Think AI can crack 90 percent soon? Drop your prediction in the comments and let us know what you’re rooting for.
Read More From This Brand:
Don’t forget to follow us for more exclusive content right here on MSN.
This slideshow was made with AI assistance and human editing.
This content is exclusive for our subscribers.
Get instant FREE access to ALL of our articles.
Dan Mitchell has been in the computer industry for more than 25 years, getting started with computers at age 7 on an Apple II.
We appreciate you taking the time to share your feedback about this page with us.
Whether it's praise for something good, or ideas to improve something that
isn't quite right, we're excited to hear from you.
Stay up to date on all the latest tech, computing and smarter living. 100% FREE
Unsubscribe at any time. We hate spam too, don't worry.

Lucky you! This thread is empty,
which means you've got dibs on the first comment.
Go for it!