Content by valerie cutts and jithin jose (1)

Building resilient networks for AI supercomputers

Valerie Cutts and Jithin Jose explain how Azure’s Fairwater AI supercomputer network is designed to keep large synchronous training jobs running through routine faults, using Multipath Reliable Connection (MRC), a two-tier multi-plane topology, and static SRv6 source routing.
Community

End of content

Rejoining the server...

Rejoin failed... trying again in seconds.

Failed to rejoin.
Please retry or reload the page.

The session has been paused by the server.

Failed to resume the session.
Please reload the page.