EMMA: Efficient Multi-node Memory-aware AllReduce Algorithms (poster)

Authors: Guerrini, V., Fan, K., Kumar, S.

Publication: The 12th Greater Chicago Area Systems Research Workshop (GCASR), Chicago, IL

URL: https://gcasr.org/2025/posters

AllReduce is a critical collective in both HPC and large-scale AI workloads. However, scaling it to Exascale systems presents key challenges due to inter-node communication bottlenecks and underutilization of intra-node resources like shared memory and NVLink. This work analyzes state-of-the-art AllReduce algorithms to identify inefficiencies and opportunities for hybrid strategies that explicitly separate intra- and inter-node communication.

We introduce a preliminary algorithmic design that leverages tunable intra-node communication patterns and discuss key performance criteria, including message count and data volume. Our early results provide insight into communication trade-offs and guide the development of adaptive AllReduce implementations optimized for Exascale systems.

Date: May 8, 2025

Document: View PDF

Related Entries

Directory:

Related Categories