Intel® XPU Manager
A solution for monitoring and managing your Intel® Data Center GPUs
Overview
Intel® XPU Manager is a free and open-source solution for local and remote monitoring and managing Intel® Data Center GPUs. It is designed to simplify administration, maximize reliability and uptime, and improve utilization.
Intel® XPU Manager extends the capabilities of the Intel® oneAPI Level Zero System Management (Sysman) APIs, and can be used standalone through its command line interface (CLI) to manage GPUs locally, or through its RESTful API to manage GPUs remotely.
Users can download the Intel® XPU Manager packages for different OSes from GitHub along with the source code.
Third party and commercial workload and cluster managers, job schedulers, and monitoring solutions can also integrate the Intel® XPU Manager library into their solutions to support Intel® Data Center GPUs.
Additionally, a Docker container image released on DockerHub can export telemetry of Intel® Data Center GPUs within a Kubernetes Cluster.
Intel® XPUM System Management Interface (SMI) is a daemon-less version of Intel® XPU System that only has a command line interface.
Intel® XPU Manager Downloads
Architecture and Integration
Intel® XPU Manager is designed to run on each node of a cluster and communicates with the Intel® oneAPI Level Zero Sysman library as well as the Intel® Data Center GPU driver as shown in the architectural diagram below on the left.
The Docker container image exports Intel® Data Center GPU telemetry in a format that is compatible with Prometheus and can be visualized within Grafana.
Intel® XPUM System Management Interface (SMI) is installed similar to Intel® XPUM System. It can be used through its CLI, but since it is daemon-less, it does not provide RESTful APIs or a library for integration.
Values and Benefits
Simplifies GPU Administration
- Detailed inventory & topology
- Customizable groupings
- Comprehensive settings
- Integratable into other solutions
Maximizes GPU Reliability & Uptime
- Comprehensive health monitoring
- Aggregated telemetry
- Real-time alerting
- Multi-level diagnostics & stress tests
Improves GPU Utilization
- Fine-grained GPU statistics
- Different performance metrics
- Customizable policies and thresholds
- Accurate power & clock management
Performs Firmware Updates
- Reliable updating for GPU components.
- Simplifies firmware updating for multiple GPUs
- OS level firmware updating
- Supports most major operating systems
Intel® XPUM System Management Interface |
||
---|---|---|
Features | Intel® XPU Manager |
Intel® XPUM System Management Interface |
Architecture | Daemon based | Daemon-less |
Interfaces | CLI, Remote HTTPS, Local SSH, Local Library
|
CLI |
Discovery, Inventory, Topology | Yes
|
Yes |
GPU Grouping | Yes | No |
GPU statistics/metrics | Yes (aggregated system-level) |
Yes (real-time per GPU) |
GPU Configuration | Yes
|
Yes* |
GPU Policies | Yes
(per Group or GPU)
|
No |
GPU Health & Diagnostics | Yes (per Group or GPU) |
Yes (per GPU) |
Firmware updating | Yes (per GPU) |
Yes (per GPU) |
Supported OSes | Linux: Ubuntu 20.04.3, RHEL 8.4, CentOS 8 Stream, SLES 15 SP3 Windows: Win Server 2022 (limited features) |
Linux: Ubuntu 20.04.3, RHEL 8.4, CentOS 8 Stream, SLES 15 SP3 |
Frameworks | Prometheus exporter, Docker container, Icinga plugin | N/A |
CLI Screenshots
常見問題集 (FAQ)
常見問題集
Intel® XPU Manager can be used standalone through its command line interface (CLI) to manage GPUs locally, or through its RESTful API to manage GPUs remotely. Moreover, Intel XPU Manager can also be integrated as a library into 3rd party solutions. Intel® XPU System Management Interface (SMI) is a daemon-less version of Intel® XPU Manager that can be used through its CLI.
We are working with a variety of Independent Software Vendors (ISVs) that develop cluster and workload managers, resource schedulers, and monitoring solutions to integrate Intel XPU Manager to support Intel Data Center GPUs.
Yes, Intel is also releasing a docker container image that allows Intel XPU Manager to be used within a Kubernetes environment and provide GPU telemetry that is compatible with Prometheus and can be visualized in Grafana.
The source code and packages for Ubuntu, Red Hat Enterprise Linux, CentOS, SUSE Linux Enterprise Server, and Windows can be downloaded from GitHub, and the docker container image is available on DockerHub.
Intel® XPU Manager supports all Intel® Data Center GPUs.
- Intel® XPU Manager currently supports Ubuntu 20.04.3 and 22.04, RHEL 8.5 and 8.6, CentOS 7.4 and 7.9, CentOS 8 Stream, SLES 15 SP3, as well as Windows Server 2019 and 2022 (with limited features).
- Intel® XPU System Management Interface currently supports Ubuntu 20.04.3 and 22.04, RHEL 8.6, CentOS 8 Stream, and SLES 15 SP4.
Related Links
Intel® oneAPI
Intel® Data Center GPU Flex Series
Intel® XPU Manager on Github
Intel® XPU Manager Docker Image on DockerHub
For more information, please contact: xpumsupport@intel.com