Operation of a State of the Art HPC Cluster for DSgenAI

FAU will maintain and administer a large-scale GPU-accelerated HPC cluster hosted at NHR@FAU. This system will be operated using a cutting-edge hot-water cooling system that uses waste heat for facility heating. With global and continuous, job-specific hardware performance monitoring provided by the in-house developed ClusterCockpit framework, support personnel and team managers will have access to job information and performance metrics. Users can access job information and performance metrics via a modern web interface. This data will also be used to generate aggregate statistics on job efficiency and overall energy usage. Bleeding-edge adaptive control of system energy parameters enables optimal energy usage.

Author: Dr. Jan Eitzinger

Related Posts