EMMA: Efficient Multi-node Memory-aware AllReduce Algorithms (poster)![]()
Authors: Guerrini, V., Fan, K., Kumar, S.
Publication: The 12th Greater Chicago Area Systems Research Workshop (GCASR), Chicago, IL URL: https://gcasr.org/2025/posters AllReduce is a critical collective in both HPC and large-scale AI workloads. However, scaling it to Exascale systems presents key challenges due to inter-node communication bottlenecks and underutilization of intra-node resources like shared memory and NVLink. This work analyzes state-of-the-art AllReduce algorithms to identify inefficiencies and opportunities for hybrid strategies that explicitly separate intra- and inter-node communication. We introduce a preliminary algorithmic design that leverages tunable intra-node communication patterns and discuss key performance criteria, including message count and data volume. Our early results provide insight into communication trade-offs and guide the development of adaptive AllReduce implementations optimized for Exascale systems. Date: May 8, 2025 Document: View PDF |