Microsoft has introduced the Maia 200, its new AI inference accelerator, developed on TSMC’s 3nm process. Engineered for low-precision inference tasks, this accelerator features FP8 and FP4 tensor processing, boasting a memory system with 216GB of HBM3e at 7 TB/s and 272MB of on-chip SRAM.
Microsoft claims over 10 petaFLOPS in FP4 and over 5 petaFLOPS in FP8 performance within a 750W SoC TDP envelope. A distinctive two-tier scale-up network utilising standard Ethernet and a custom transport layer supports 2.8 TB/s bidirectional bandwidth per accelerator.
Initial deployment is underway in Microsoft’s US Central data centre near Des Moines, with plans to expand to other regions.
Read the full article on IN Electronics & Design.




