Breaking Barriers in Domestic AI Model Development

Advertisements

In January of this year, DeepSeek, a pioneering company in artificial intelligence, made waves worldwide with the launch of its general-purpose large model, DeepSeek-R1. This model has garnered significant attention thanks to its standout features of low cost and high performance, marking a momentous milestone in China's AI development and offering a treasure trove of insights for the industry at large.

The magic behind DeepSeek’s success lies in its innovative technologies: Parallel Thread Execution (PTX), Mixture of Experts (MoE), Multi-head Latent Attention (MLA), and Multi-Token Prediction (MTP). These advances enable DeepSeek to significantly enhance model performance even in the context of limited computing power compared to its international counterpartsRemarkably, they have managed to trim training costs down to a mere 10% of the industry standardThis achievement not only lowers the barriers to deploying large models but also illustrates the feasibility of compensating for power shortages through algorithmic optimizationIt presents a refreshing alternative to the Western model of AI development, which is often characterized by the mantra of “throwing resources at problems” in hopes of miraculous breakthroughs.

Moreover, DeepSeek has embraced an entirely open-source strategy, making its algorithms, model weights, and training details publicly accessibleThis principle of openness empowers global developers to draw upon, refine, and deploy the model, thus cultivating an ecosystem conducive to innovationThis approach is poised to disrupt the typical winner-takes-all competition landscape, allowing for a more collaborative and dynamic development environment.

Despite these significant advancements, it is crucial to recognize that China still faces challenges in its path towards original innovation in the AI sphereAs of 2023, only one Chinese institution made it to the list of the top ten most cited research institutes in generative AI

Advertisements

When examining areas like AI patents, deep learning models, and machine learning hardware, there remains a noticeable gap between China and the United States, indicating that the journey is far from complete.

Currently, the foundational frameworks necessary for data management in China are nascentMechanisms for data acquisition and exchange are often inadequate, making it difficult for industries to access both industry-specific and public dataConsequently, the available data for training large models is limitedQuality data labeling, which is crucial for supplying high-quality datasets, is hampered by a shortage of specialized personnelThis deficiency is especially pronounced in sectors like healthcare and autonomous driving, where precise and expert-level data annotation is urgently needed and challenging to meet.

On a global scale, the influence of domestically developed large models like DeepSeek is still in its infancy within the global technology ecosystemDomestically, the journey from fundamental AI research through to technical innovation and practical application has not been fully realizedThere are several bottlenecks in the flow of essential elements such as technology, funding, data, and talent, preventing the creation of an efficient ecological loop that could enable further iterations of large models.

To address these challenges, it is crucial to reinforce AI foundational research and technological innovationThere should be a concerted effort to develop national strategic scientific capabilities in the AI sector, pushing for interdisciplinary collaboration that fuses AI with foundational disciplines such as mathematics, physics, and brain scienceThis would elevate the level of foundational research in AI significantlyAdditionally, promoting open-source initiatives in AI technology is vitalBy centering efforts around open-source projects, there can be a collective push for technological innovation that involves contributors, providers, users, and operators alike.

Furthermore, the construction of extensive datasets must be strategically coordinated

Advertisements

Advertisements

Advertisements

Advertisements

Write A Review

Etiam tristique venenatis metus,eget maximus elit mattis et. Suspendisse felis odio,

Please Enter Your 5 star Reviews*