OAC Core: AIMCI: Artificial Intelligence for Managing Cyberinfrastructure

Manual, static, single-cluster resource amangement - - Automated, dynamic, facility-wide resource management

Authors: Lan, Z., Papka, M. E.

Publication: 2025 NSF CSSI/Cybertraining/SCIPE PI Meeting, Denver, CO

URL: https://confmeet.github.io/2025NSFCyberPI/

Advanced cyberinfrastructure (CI) is undergoing disruptive changes in system architectures and application workloads. The landscape of cyberinfrastructure workloads is rapidly expanding beyond traditional computational simulations to include a hybrid mix of applications. CI facilities now host diverse high-performance systems with heterogeneous configurations, leading to a complex mix of computing, memory, and storage components. Existing CI management methods, which are heavily heuristic or manual-based, struggle with these evolving challenges. This project addresses the complex challenges of CI resource management by integrating artificial intelligence (AI) technologies with human expertise. The proposed AIMCI framework transitions from managing isolated single clusters to coordinating facility-wide management, orchestrating the entire facility as a unified pool of diverse resources for a broad spectrum of applications with various resource requirements.

Date: July 27, 2025 - July 29, 2025

Document: View PDF

Related Entries

Directory:

Research:

Related Categories