Xiaomi has released MiMo-V2.5-Pro-UltraSpeed: The generation speed has been increased by 10 times! It can process over 1,000 tokens per second.

Igeekphone News, June 9th: Xiaomi, in collaboration with TileRT, has officially launched MiMo-V2.5-Pro-UltraSpeed, achieving a landmark breakthrough in the industry: Based on a trillion-parameter large model, on a single standard 8-card general-purpose GPU node, the text generation speed has been increased to 1000 tokens per second for the first time.

Even the peak rate can reach 1200 tokens per second. There is no need to customize dedicated chips throughout the process, significantly lowering the implementation threshold for ultra-fast AI inference.
This version has launched a limited-time API service in synchronization. The pricing is three times that of the original MiMo-V2.5-Pro, but the generation speed has increased by approximately 10 times, presenting a remarkable cost-performance advantage.

Due to the limitation of high-speed reasoning resources, the service is temporarily available on a subscription basis. The trial period is from June 9th to 23rd, 23:59 Beijing Time. The platform will give priority to reviewing enterprises and professional developers with actual business needs. Ordinary users can freely experience the conversation function through the dedicated webpage.

The daily queue limit for a single account is 10 times, and the maximum duration of a single session is 30 minutes. If the session is idle for 5 minutes, it will automatically be disconnected to ensure fair resource allocation.

This performance leap is achieved through the deep collaborative design of models and systems. The core innovations include three major technological advancements:

The first one is the FP4 quantization technology. According to the characteristics of the model’s MoE architecture, only the expert layer, which accounts for the majority of the parameters, undergoes lossless FP4 quantization. The remaining modules retain their original precision. This not only reduces memory usage and alleviates bandwidth pressure but also ensures that the overall capability of the model remains largely unchanged.

The second is DFlash block parallel speculative decoding. It abandons the traditional serial decoding mode and can predict an entire text block at a time. In scenarios such as code and mathematical reasoning, it can confirm an average of 6-7 tokens per round, significantly improving the decoding efficiency.

Thirdly, by relying on the TileRT inference system, the GPU execution architecture is restructured. Persistent cores and heterogeneous pipelines are adopted to eliminate the delay caused by operator switching, allowing the hardware computing power to remain fully operational at all times.

The extremely fast reasoning ability has also reshaped the application scenarios of AI. The ultra-high speed enables parallel reasoning of models, autonomous error correction, and improvement of logical reasoning quality; it significantly alleviates the waiting and lag in code generation, releasing the productivity of programming agents; at the same time, it enables the deployment of trillion-parameter large models in high-frequency quantitative trading, real-time anti-fraud, medical image analysis and other real-time decision-making scenarios with millisecond latency.

Xiaomi has released MiMo-V2.5-Pro-UltraSpeed: The generation speed has been increased by 10 times! It can process over 1,000 tokens per second.

Apple’s First Foldable iPhone Ultra Expected to Launch in September: New Design Images Reveal Large Tablet-Style Folding Display

Honor Power 3 Configuration Leaked: 7-Inch Display, MediaTek 8-Series Chipset, and 10,000mAh+ Battery Expected

Redmi K100 Series International Version Certified: K100 Pro and K100 Pro Max Launching Globally with Snapdragon 8E5 Chip

Leave A Reply Cancel Reply

OXVA Nexlim 2 Mini Pod Vape Hands on Review

OXVA Nexlim 2 Pod Vape Hands on Review

iPlay LUMO 8K Review – A Compact Prefilled Pod Kit with Big Flavor and Everyday Convenience

iPlay HOOLA 150K Review – A Cloud-Chasing Giant with Music, Modes & Massive Longevity

ACMER ASCARVA 4S: Precision CNC Power for Makers, DIYers & Small Workshops

ACMER P2 20W Laser Engraver Fixed Focus Engraving: Hands on Review

xTool F1 Ultra Review: World’s First 20W Fiber & 20W Diode Laser Engraver

Anycubic Kobra 3 Combo Review: The Multicolor Masterpiece?

Redmi K100 Series International Version Certified: K100 Pro and K100 Pro Max Launching Globally with Snapdragon 8E5 Chip

Xiaomi 17 Pro Series Receives HyperOS 3.0 Beta with Rear Display Payment Shortcuts and Stability Improvements

Xiaomi 18 Reportedly Set for Early 2027 Debut with 2nm Snapdragon 8 Elite 6 Chip and Dual 200MP Leica Cameras

Xiaomi Pad 9 Receives 3C Certification, Tipped to Feature Snapdragon 8-Series Chip and 9,720mAh Battery

Xiaomi has released MiMo-V2.5-Pro-UltraSpeed: The generation speed has been increased by 10 times! It can process over 1,000 tokens per second.

Related Posts

Leave A Reply Cancel Reply