centos-stream-9上安装nvidia驱动和cuda-toolkit

发布于:2025-04-13 ⋅ 阅读:(28) ⋅ 点赞:(0)

驱动安装

参考:centos-stream-9-上安装-nvidia-驱动程序

1. 更新系统

首先,确保你的系统是最新的:

sudo dnf update -y

2. NVIDIA GPU安装

检查系统是否安装了 NVIDIA GPU

您可以使用以下命令检查您的计算机是否安装了 NVIDIA GPU:

lspci | egrep 'VGA|3D'

如您所见,我的计算机上安装了 NVIDIA GeForce RTX 3060 GPU。您可能安装了不同的 NVIDIA GPU。

[root@cheng ~]# lspci | egrep 'VGA|3D'
06:00.0 VGA compatible controller: NVIDIA Corporation GA106 [GeForce RTX 3060 Lite Hash Rate] (rev a1)

默认情况下,CentOS Stream 9 上使用开源 Nouveau GPU 驱动程序1,而不是专有 NVIDIA GPU 驱动程序2。安装专有 NVIDIA GPU 驱动程序后,您将看到它们被使用,而不是开源 Nouveau GPU 驱动程序。

lsmod | grep nouveau
lsmod | grep nvidia

[root@cheng ~]# lsmod | grep nouveau
lsmod | grep nvidia
nvidia_drm            143360  0
nvidia_modeset       1421312  1 nvidia_drm
nvidia_uvm           3899392  0
nvidia              70721536  2 nvidia_uvm,nvidia_modeset
video                  77824  1 nvidia_modeset
drm_kms_helper        274432  2 nvidia_drm
drm                   782336  4 drm_kms_helper,nvidia,nvidia_drm
[root@cheng ~]# lsmod | grep nvidia
lsmod | grep nouveau
nvidia_drm            143360  0
nvidia_modeset       1421312  1 nvidia_drm
nvidia_uvm           3899392  0
nvidia              70721536  2 nvidia_uvm,nvidia_modeset
video                  77824  1 nvidia_modeset
drm_kms_helper        274432  2 nvidia_drm
drm                   782336  4 drm_kms_helper,nvidia,nvidia_drm

从 BIOS 禁用安全启动
要使 NVIDIA GPU 驱动程序在 CentOS Stream 9 上运行,如果主板使用 UEFI 固件启动操作系统,则必须从主板的 BIOS 禁用安全启动

在 CentOS Stream 9 上启用 EPEL 存储库
要在 CentOS Stream 9 上安装 NVIDIA GPU 驱动程序,您必须安装所需的构建工具和编译 NVIDIA 内核模块所需的依赖库。其中一些可以在 CentOS Stream 9 EPEL 存储库中找到。


在本节中,我将向您展示如何在 CentOS Stream 9 上启用 EPEL 存储库。

2.1 首先,使用以下命令更新 DNF 软件包存储库缓存:
sudo dnf makecache

使用以下命令启用官方 CentOS Stream 9 CRB 软件包存储库:

sudo dnf config-manager --set-enabled crb

使用以下命令安装 epel-release 和 epel-next-release 软件包:

sudo dnf install epel-release epel-next-release

要确认安装,请按Y,然后按。

要确认 GPG 密钥,请按 Y,然后按 。

应安装 epel-release 和 epel-next-release 软件包,并启用 EPEL 存储库。

为了使更改生效,请使用以下命令更新 DNF 软件包存储库缓存:

sudo dnf makecache
2.2 安装编译 NVIDIA 内核模块所需的依赖项和构建工具

要安装编译 NVIDIA 内核模块所需的构建工具和依赖库,请运行以下命令:

sudo dnf install kernel-headers-$(uname -r) kernel-devel-$(uname -r) tar bzip2 make automake gcc gcc-c++ pciutils elfutils-libelf-devel libglvnd-opengl libglvnd-glx libglvnd-devel acpid pkgconfig dkms

要确认安装,请按Y,然后按。

正在从互联网下载所需的软件包。需要一段时间才能完成。

下载软件包后,系统会要求您确认 CentOS 官方软件包存储库的 GPG 密钥。

要确认 GPG 密钥,请按 Y,然后按 。

要确认 EPEL 存储库的 GPG 密钥,请按 Y,然后按 。

安装应该继续。

至此,编译NVIDIA内核模块所需的依赖库和构建工具就应该安装完毕了。

2.3 在 CentOS Stream 9 上添加官方 NVIDIA CUDA 软件包存储库

要在 CentOS Stream 9 上添加官方 NVIDIA CUDA 软件包存储库,请运行以下命令:

sudo dnf config-manager --add-repo http://developer.download.nvidia.com/compute/cuda/repos/rhel9/$(uname -i)/cuda-rhel9.repo

为了使更改生效,请使用以下命令更新 DNF 软件包存储库缓存:

sudo dnf makecache
2.4 在 CentOS Stream 9 上安装最新的 NVIDIA GPU 驱动程序

要在 CentOS Stream 9 上安装最新版本的 NVIDIA GPU 驱动程序,请运行以下命令:

sudo dnf module install nvidia-driver:latest-dkms

要确认安装,请按Y,然后按。

所有NVIDIA GPU驱动程序包和所需的依赖包都是从互联网上下载的。需要一段时间才能完成。

下载软件包后,系统会要求您确认官方 NVIDIA 软件包存储库的 GPG 密钥。按 Y,然后按 确认 GPG 密钥。

安装应该继续。需要一段时间才能完成。

我在这步执行中报错:

Last metadata expiration check: 0:05:51 ago on Fri 11 Apr 2025 03:30:46 PM CST.
Error: 
 Problem 1: package nvidia-driver-libs-3:570.124.06-1.el9.x86_64 from cuda-rhel9-x86_64 requires egl-wayland(x86-64) >= 1.1.13.1-3, but none of the providers can be installed
  - cannot install the best candidate for the job
  - package egl-wayland-1.1.13.1-3.el9.x86_64 from cuda-rhel9-x86_64 is filtered out by modular filtering
  - package egl-wayland-1.1.19~20250313gitf1fd514-1.el9.x86_64 from cuda-rhel9-x86_64 is filtered out by modular filtering
 Problem 2: package nvidia-driver-3:570.124.06-1.el9.x86_64 from cuda-rhel9-x86_64 requires nvidia-driver-libs(x86-64) = 3:570.124.06, but none of the providers can be installed
  - package nvidia-driver-3:570.124.06-1.el9.x86_64 from cuda-rhel9-x86_64 requires libnvidia-glvkspirv.so.570.124.06()(64bit), but none of the providers can be installed
  - package nvidia-driver-3:570.124.06-1.el9.x86_64 from cuda-rhel9-x86_64 requires libnvidia-gpucomp.so.570.124.06()(64bit), but none of the providers can be installed
  - package nvidia-driver-libs-3:570.124.06-1.el9.x86_64 from cuda-rhel9-x86_64 requires egl-wayland(x86-64) >= 1.1.13.1-3, but none of the providers can be installed
  - cannot install the best candidate for the job
  - package egl-wayland-1.1.13.1-3.el9.x86_64 from cuda-rhel9-x86_64 is filtered out by modular filtering
  - package egl-wayland-1.1.19~20250313gitf1fd514-1.el9.x86_64 from cuda-rhel9-x86_64 is filtered out by modular filtering
 Problem 3: package xorg-x11-nvidia-3:570.124.06-1.el9.x86_64 from cuda-rhel9-x86_64 requires libnvidia-glcore.so.570.124.06()(64bit), but none of the providers can be installed
  - package xorg-x11-nvidia-3:570.124.06-1.el9.x86_64 from cuda-rhel9-x86_64 requires libnvidia-tls.so.570.124.06()(64bit), but none of the providers can be installed
  - package nvidia-xconfig-3:570.124.06-1.el9.x86_64 from cuda-rhel9-x86_64 requires xorg-x11-nvidia(x86-64) >= 3:570.124.06, but none of the providers can be installed
  - package nvidia-driver-libs-3:570.124.06-1.el9.x86_64 from cuda-rhel9-x86_64 requires egl-wayland(x86-64) >= 1.1.13.1-3, but none of the providers can be installed
  - cannot install the best candidate for the job
  - package egl-wayland-1.1.13.1-3.el9.x86_64 from cuda-rhel9-x86_64 is filtered out by modular filtering
  - package egl-wayland-1.1.19~20250313gitf1fd514-1.el9.x86_64 from cuda-rhel9-x86_64 is filtered out by modular filtering
 Problem 4: package nvidia-driver-3:570.124.06-1.el9.x86_64 from cuda-rhel9-x86_64 requires nvidia-driver-libs(x86-64) = 3:570.124.06, but none of the providers can be installed
  - package nvidia-driver-3:570.124.06-1.el9.x86_64 from cuda-rhel9-x86_64 requires libnvidia-glvkspirv.so.570.124.06()(64bit), but none of the providers can be installed
  - package nvidia-driver-3:570.124.06-1.el9.x86_64 from cuda-rhel9-x86_64 requires libnvidia-gpucomp.so.570.124.06()(64bit), but none of the providers can be installed
  - package nvidia-settings-3:570.124.06-1.el9.x86_64 from cuda-rhel9-x86_64 requires nvidia-driver(x86-64) = 3:570.124.06, but none of the providers can be installed
  - package nvidia-driver-libs-3:570.124.06-1.el9.x86_64 from cuda-rhel9-x86_64 requires egl-wayland(x86-64) >= 1.1.13.1-3, but none of the providers can be installed
  - cannot install the best candidate for the job
  - package egl-wayland-1.1.13.1-3.el9.x86_64 from cuda-rhel9-x86_64 is filtered out by modular filtering
  - package egl-wayland-1.1.19~20250313gitf1fd514-1.el9.x86_64 from cuda-rhel9-x86_64 is filtered out by modular filtering
(try to add '--skip-broken' to skip uninstallable packages or '--nobest' to use not only best candidate packages)

问了大模型的解决办法都不行,最后发现错误日志最后括号内的提示,改成如下命令就成功了:

sudo dnf module install nvidia-driver:latest-dkms --skip-broken
2.5 为了使更改生效,请使用以下命令重新启动计算机:
sudo reboot

检查 NVIDIA 驱动程序是否安装正确
计算机启动后,您应该会看到使用专有的 NVIDIA GPU 驱动程序1,而不是开源的 Nouveau GPU 驱动程序2。

lsmod | grep nvidia
lsmod | grep nouveau

您还应该在 CentOS Stream 9 的应用程序菜单中找到NVIDIA X Server Settings应用程序。单击它。

NVIDIA X 服务器设置应用程序运行时应该没有任何错误,并且应该显示与您安装的 NVIDIA GPU 相关的大量信息。

2.6 测试

您还应该能够运行 NVIDIA 命令行程序,例如 nvidia-smi

[root@cheng ~]# nvidia-smi

Sun Dec 22 14:37:55 2024       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 565.57.01              Driver Version: 565.57.01      CUDA Version: 12.7     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 3060        Off |   00000000:06:00.0 Off |                  N/A |
| 31%   23C    P8              6W /  170W |      18MiB /  12288MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                                                         
+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|  No running processes found                                                             |
+-----------------------------------------------------------------------------------------+

2、cuda-toolkit安装

2.1 安装

参考官网:CUDA Toolkit 12.8 Update 1 Downloads

2.2 环境配置

全局配置,对所有用户生效:

[chenfeng@iZ2ze8ss1mj33afx13mulcZ temp]$ sudo vim /etc/profile
在文件末尾追加:

export PATH=/usr/local/cuda/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH

然后,重启终端 或 执行 source /etc/profile

测试

nvcc --version

[chenfeng@iZ2ze8ss1mj33afx13mulcZ temp]$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2025 NVIDIA Corporation
Built on Fri_Feb_21_20:23:50_PST_2025
Cuda compilation tools, release 12.8, V12.8.93
Build cuda_12.8.r12.8/compiler.35583870_0