DeepSeek's new chatbot boasts an impressive introduction: "Hi, I was created so you can ask anything and get an answer that might even surprise you." This AI, a product of the Chinese startup DeepSeek, has quickly become a major market player, even contributing to a significant drop in NVIDIA's stock price.

Image: ensigame.com
DeepSeek's success stems from its innovative architecture and training methods. Key technologies include:
- Multi-token Prediction (MTP): Instead of predicting words individually, MTP forecasts multiple words simultaneously, boosting accuracy and efficiency.
- Mixture of Experts (MoE): This architecture utilizes 256 neural networks in DeepSeek V3, activating eight for each token, accelerating training and improving performance.
- Multi-head Latent Attention (MLA): MLA repeatedly extracts key details from text fragments, ensuring crucial information isn't missed, leading to a more nuanced understanding of input data.

Image: ensigame.com
While DeepSeek initially claimed a remarkably low training cost of $6 million for DeepSeek V3 using 2048 GPUs, SemiAnalysis revealed a far more substantial infrastructure: approximately 50,000 Nvidia Hopper GPUs, including 10,000 H800s, 10,000 H100s, and additional H20s, distributed across multiple data centers. This represents a total server investment of roughly $1.6 billion, with operational expenses estimated at $944 million.
DeepSeek, a subsidiary of the Chinese hedge fund High-Flyer, owns its data centers, offering greater control and faster innovation implementation than cloud-based competitors. This self-funded approach enhances flexibility and decision-making speed. Furthermore, the company attracts top talent, with some researchers earning over $1.3 million annually, primarily from leading Chinese universities.

Image: ensigame.com
DeepSeek's initial $6 million figure only covers pre-training GPU usage, excluding research, refinement, data processing, and infrastructure. The company's total AI development investment exceeds $500 million. However, its lean structure facilitates efficient innovation compared to larger, more bureaucratic companies.

Image: ensigame.com
DeepSeek's success showcases the potential of a well-funded, independent AI company to compete with industry giants. While the "revolutionary budget" claim is arguably exaggerated, its achievements are undeniable, particularly considering the significantly higher costs incurred by competitors. For example, DeepSeek spent $5 million on R1, while ChatGPT4 cost $100 million. This highlights DeepSeek's cost efficiency, despite the substantial overall investment.