Microsoft unveils Maia 200 for inference acceleration

Microsoft unveils Maia 200 for inference acceleration

Microsoft unveils its Maia 200 AI inference accelerator. Microsoft’s Maia 200 is designed for low-precision inference workloads, featuring 216GB HBM3e memory with a 7 TB/s bandwidth. The accelerator is part of Azure’s US Central deployment, with an SDK preview integrating PyTorch and a Triton compiler.


Microsoft has introduced the Maia 200, its new AI inference accelerator, developed on TSMC’s 3nm process. Engineered for low-precision inference tasks, this accelerator features FP8 and FP4 tensor processing, boasting a memory system with 216GB of HBM3e at 7 TB/s and 272MB of on-chip SRAM.

Microsoft claims over 10 petaFLOPS in FP4 and over 5 petaFLOPS in FP8 performance within a 750W SoC TDP envelope. A distinctive two-tier scale-up network utilising standard Ethernet and a custom transport layer supports 2.8 TB/s bidirectional bandwidth per accelerator.

Initial deployment is underway in Microsoft’s US Central data centre near Des Moines, with plans to expand to other regions.

Read the full article on IN Electronics & Design.


Stories for you


  • Polymer Comply backs European plastics campaign

    Polymer Comply backs European plastics campaign

    Polymer Comply Europe has backed a campaign for regional recycling. The move adds another industry voice to calls for stronger European plastics recovery and reuse capacity.


  • Data centres lag on AI power visibility

    Data centres lag on AI power visibility

    AI growth is exposing weak power visibility in data centres. New survey findings suggest many operators still lack the monitoring needed to scale dense compute loads safely.