8 min read
8 min read

DeepSeek’s latest AI update has people talking, but not for the usual reasons. Speculation has emerged—supported by developer analysis— that DeepSeek’s R1‑0528 may have used synthetic data generated from Google’s Gemini outputs
That’s like learning by copying answers from the smartest kid in class. This method can supercharge learning speed, but many see it as crossing a line. Companies spend years and billions building these models. Using their work as training material opens up a whole new debate.

AI distillation is like having a personal tutor instead of reading thousands of books. The tutor, in this case, is another powerful AI model. Rather than starting from raw data, the AI learns by studying the polished answers produced by models like Gemini. It copies not just answers, but also how the AI thinks.
This cuts training time dramatically and saves massive amounts of computer power. For smaller companies or those short on hardware, this method offers a fast track. But it blurs the lines between innovation and imitation.

Taking another company’s AI outputs for training raises serious ethical concerns. It’s like building a business off someone else’s hard-earned research. OpenAI and Google have clear rules stating that their AI-generated content can’t be used to create competing products.
If DeepSeek pulled Gemini’s outputs into its own training, it may have stepped into forbidden territory. This isn’t just about fairness; it’s about protecting billions invested in developing these models. The debate over who owns AI-generated content is heating up fast as more companies try these shortcuts.

DeepSeek isn’t new to controversy. When it first appeared, there were whispers that it may have been trained on ChatGPT’s outputs. That earlier situation drew attention because DeepSeek’s training costs were surprisingly low compared to industry giants.
Experts wondered how a newcomer could match top AI models without the same massive investment. The Gemini situation feels like déjà vu for many in the tech world. DeepSeek’s methods keep raising questions, but so far, the company continues pushing forward at full speed.

Usually, AI models consume mountains of raw data to learn. They process books, websites, articles, and even videos. This process requires enormous computer power, time, and specialized hardware called GPUs. Training large models can take months and cost millions.
By reading and analyzing raw data, the AI builds its understanding of language, reasoning, and knowledge. That’s why shortcuts like distillation are so tempting. Instead of starting from scratch, some companies look for ways to speed up the process using pre-chewed information from existing AIs.

Synthetic data isn’t pulled from the real world. Instead, it’s generated by another AI model producing fresh examples on command. By asking an advanced AI like Gemini to answer tons of questions, DeepSeek could create a massive training dataset fast.
This allows the model to practice on high-quality answers without scouring the internet for raw material. For companies with limited hardware but plenty of money, synthetic data becomes an efficient, scalable option. But again, it sparks heated debates about originality and ownership.

Many believe Gemini played a key role in shaping DeepSeek’s newest version. It explains the model’s shift in tone and response style. Experts comparing outputs say DeepSeek now echoes Gemini’s unique way of handling complex questions.
The similarities don’t stop at answers; even the AI’s internal thought processes, called traces, mirror Gemini’s patterns. This level of overlap suggests more than coincidence. While DeepSeek hasn’t confirmed it, mounting evidence fuels the theory that Gemini quietly served as an unofficial teacher.
AI models leave behind clues when they work, much like footprints in the sand. These traces show how the AI moves from question to answer. Developers who study these traces noticed DeepSeek’s steps resemble Gemini’s approach closely.
Forensic analysis of these thought paths can reveal where models might have drawn inspiration or data. It’s not just about the final answer but the reasoning process behind it. The deeper experts dig, the more they find Gemini’s fingerprints on DeepSeek’s latest model.

AI experts like Nathan Lambert believe using Gemini makes strategic sense for DeepSeek. The company faces limits on hardware but has financial resources. Training with Gemini’s outputs helps stretch its computing power without needing more expensive GPUs.
Lambert even admitted that, if in their shoes, he’d consider a similar approach. When access to advanced chips is restricted, companies look for every advantage. This tactic may be controversial, but for some, it’s simply smart business in a highly competitive field.

Trade restrictions between the U.S. and China block access to cutting-edge chips and AI hardware. Without top-tier GPUs, Chinese companies like DeepSeek face serious challenges when building large AI models. This global tech tension pushes them to find alternative training methods, like distillation from existing AI models.
By generating synthetic data through models like Gemini, they can continue advancing despite hardware shortages. These restrictions aren’t just slowing innovation; they’re reshaping how companies approach AI development altogether.

While distillation offers speed, it also carries heavy risks. Companies could face lawsuits or bans if caught violating others’ terms of service. OpenAI and Google invest heavily to protect their intellectual property, and they don’t take kindly to potential misuse.
Legal battles over AI data usage are heating up worldwide. If companies like DeepSeek cross legal boundaries, they might face serious consequences that could affect their future growth and reputation.

Some see DeepSeek’s strategy as clever survival in a tough market, while others call it outright theft. Training AI requires creativity, technical skill, and lots of resources, even with shortcuts like distillation.
But using another company’s finished work blurs ethical lines and raises uncomfortable questions. The debate over what counts as innovation versus imitation is far from settled. As AI competition grows, these gray areas will only get messier.

Big tech companies guard their AI models like gold. Billions of dollars and years of research go into building these systems. If others can freely copy their models’ outputs, it threatens their business model and future profits.
Protecting AI outputs is as critical as guarding trade secrets or patented inventions. This is why companies include strict rules in their terms of service to stop others from using their AI responses for training purposes.

The race to build the smartest AI isn’t slowing down. Every company wants to leap ahead of the competition. DeepSeek’s rumored distillation of Gemini is just one example of how intense this competition has become.
Smaller players try risky shortcuts to keep up with tech giants that have bigger budgets and more hardware. As the AI arms race heats up, creative and sometimes questionable strategies are becoming more common across the industry.

Right now, AI copyright laws are still developing. Courts and lawmakers are scrambling to catch up with technology. Future regulations could make distillation from other models illegal, enforceable with heavy penalties.
Clear rules would bring stability but also limit creative shortcuts that some companies rely on. For now, companies like DeepSeek operate in a legal gray area, hoping regulators don’t clamp down too hard, too soon.
Here you can check out why are companies banning DeepSeek AI to see how this is already starting to unfold.

Despite the buzz, DeepSeek keeps pushing ahead, building better models and attracting attention. The controversy hasn’t slowed its growth or ambitions.
As more eyes turn to its methods, DeepSeek might face future legal or ethical battles. For now, it’s charging forward, part of a fast-moving AI industry where innovation, competition, and controversy often collide.
For a closer look at Google Gemini’s role, head over to Worried about DeepSeek? Google Gemini tracks you more.
Do you think AI companies should be allowed to train on other AIs? Drop your thoughts in the comments and join the conversation.
Read More From This Brand:
Don’t forget to follow us for more exclusive content right here on MSN.
This slideshow was made with AI assistance and human editing.
This content is exclusive for our subscribers.
Get instant FREE access to ALL of our articles.
Dan Mitchell has been in the computer industry for more than 25 years, getting started with computers at age 7 on an Apple II.
We appreciate you taking the time to share your feedback about this page with us.
Whether it's praise for something good, or ideas to improve something that
isn't quite right, we're excited to hear from you.
Stay up to date on all the latest tech, computing and smarter living. 100% FREE
Unsubscribe at any time. We hate spam too, don't worry.

Lucky you! This thread is empty,
which means you've got dibs on the first comment.
Go for it!