On April 30th, Kuaitech reported that this afternoon, DeepSeek released a new model named DeepSeek-Prover-V2-671B on the AI open-source community Hugging Face.
It is introduced that DeepSeek-Prover-V2-671B has a parameter count of 671 billion, uses a more efficient safetensors file format, and supports multiple calculation accuracies such as BF16, FP8, and F32, facilitating the faster and more resource-efficient training and deployment of the model.
In terms of model architecture, this model uses the DeepSeek-V3 architecture, adopts the MoE (Hybrid Expert) mode, and has 61 Transformer layers and 7,168-dimensional hidden layers.
It simultaneously supports ultra-long contexts, with the maximum position embedding reaching 163,840, enabling it to handle complex mathematical proofs. Moreover, it adopts FP8 quantization, which can reduce the model size and improve the reasoning efficiency through quantization technology.
Some netizens analyzed that this model is regarded as an upgraded version of Prover-V1.5, focusing on the proof of formal theorems, specifically used to solve mathematical problems, and is good at automatically proving theorems and complex calculations, similar to the self-playing style of AlphaGo in Go.
It is worth looking forward to how its performance test will go next.