赞
踩
安装好exsi以后,找到管理----硬件-----PCI设备,勾选想要直通的显卡,然后点击“切换直通”
切换以后可以看到列表中的直通列显示为活动就对了。
然后编辑虚拟机设置,CPU关闭硬件虚拟化(向客户机操作系统公开硬件辅助的虚拟化)
内存勾选锁定(预留所有客户机内存,全部锁定)
,虚拟机选项----引导选项-----关闭UEFU安全引导(这个必须要关,很重要)
添加其他设备-------PCI设备,在配置界面的最底部会多出来一个“新PCI设备”,选择想要直通的设备,
依次添加想要直通的显卡
看其他文章建议,将声卡也一块加进去,那就把声卡也加进去避免出现未知问题。下图是我正常工作以后的截图:
我这里安装的是ubuntu 22.04 server,内核是5.19.0-32,安装过程中不要勾选(安装第三方显卡或WIFI 驱动那个复选框),安装完系统以后在安装英伟达驱动前务必先apt update更新包,再apt upgrade升级所有包,特别是和核心有关的包都升级到最新版以后再开始安装英伟达驱动。
按照下面的方法进行安装编译器等:
sudo apt install libglvnd-core-dev libglvnd-dev build-essential
安装好了基本的编译环境以后,用你的浏览器打开
https://developer.nvidia.com/cuda-downloads?target_os=Linux&target_arch=x86_64&Distribution=Ubuntu&target_version=22.04&target_type=deb_local
如果你安装的系统和我的不一样就按照实际情况选择你的操作系统类型,如果都一样就应该可以直接打开就能用,最后得到下面这样的内容:
- wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-ubuntu2204.pin
- sudo mv cuda-ubuntu2204.pin /etc/apt/preferences.d/cuda-repository-pin-600
- wget https://developer.download.nvidia.com/compute/cuda/12.4.1/local_installers/cuda-repo-ubuntu2204-12-4-local_12.4.1-550.54.15-1_amd64.deb
- sudo dpkg -i cuda-repo-ubuntu2204-12-4-local_12.4.1-550.54.15-1_amd64.deb
- sudo cp /var/cuda-repo-ubuntu2204-12-4-local/cuda-*-keyring.gpg /usr/share/keyrings/
- sudo apt-get update
- sudo apt-get -y install cuda-toolkit-12-4
如果你的网络不好,可以用迅雷等提前将第三行命令wget那个deb安装包下载下来然后上传到虚拟机里面使用,上面的命令都执行完就行了。
到Official Drivers | NVIDIA 这里下载对应的设备驱动.run文件,和在实体物理机器里面安装驱动不同,下面这一步必须用这个.run文件进行安装,主要是要这个-m=kernel-open参数。
执行你下载得到的.run文件,例如
- sudo chmod +x ./NVIDIA-Linux-x86_64-550.107.02.run
- sudo ./NVIDIA-Linux-x86_64-550.107.02.run -m=kernel-open
执行以后有大量的选项让你选择,按照提示进行选择安装就可以了。
如果安装的是ubuntu server系统,按照提示直接安装完reboot就可以用了,如下:
后续就是安装cuda开发环境
https://developer.nvidia.com/cudnn-downloads?target_os=Linux&target_arch=x86_64&Distribution=Ubuntu&target_version=22.04&target_type=deb_local
和前面的安装过程类似,也是打开上面这个地址,根据你的实际系统类型进行选择,选择完了以后得到一个差不多是这样的命令清单,按照提示执行一遍就可以了。
- wget https://developer.download.nvidia.com/compute/cudnn/9.1.1/local_installers/cudnn-local-repo-ubuntu2204-9.1.1_1.0-1_amd64.deb
- sudo dpkg -i cudnn-local-repo-ubuntu2204-9.1.1_1.0-1_amd64.deb
- sudo cp /var/cudnn-local-repo-ubuntu2204-9.1.1/cudnn-*-keyring.gpg /usr/share/keyrings/
- sudo apt-get update
- sudo apt-get -y install cudnn
安装好以后配置一下环境变量,vi /etc/profile在文件的最后添加以下内容:
- export PATH=/usr/local/cuda/bin:$PATH
- export CPATH=$CPATH:/usr/include:/usr/local/cuda/include
- export LIBRARY_PATH=$LIBRARY_PATH:/usr/lib/x86_64-linux-gnu:/usr/local/cuda/lib64
- export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/lib/x86_64-linux-gnu:/usr/local/cuda/lib64
到这里在虚拟机里面的英伟达显卡驱动、cuda这些就都安装好了,上面是针对ubuntu 22.04 server版本的安装配置过程,如果你用的是ubuntu 22 desktop,需要按照下面的方法安装,在执行.run文件之前先编辑以下文件并添加内容如下:
sudo nano /etc/modprobe.d/blacklist-nvidia-nouveau.conf
- blacklist nouveau
- options nouveau modeset=0
sudo nano /etc/modprobe.d/nvidia.conf
options nvidia NVreg_OpenRmEnableUnsupportedGpus=1
sudo update-initramfs -u
reboot
重启完成以后上传下载得到的.run文件到ubuntu 22.04 desktop,chmod +x 赋予可执行权限,然后
./NVIDIA-Linux-x86_64-550.107.02.run -m=kernel-open
所有能选择yes的都选择yes,安装完成以后reboot,执行nvidia-smi,可以看到下面的画面:
到这里就都安装完了,跑AI算法应该是没问题的,但是想正常使用ubuntu 的gui应该是有点问题,反正我这里是不行,也不知道是哪里有问题。
-----------------------------------------------------------------------------------------------------------------------------
下面的资料是来回折腾尝试配置的时候的资料,不用看了,按照上面的方法安装就可以用了,不需要修改什么虚拟机的配置参数值,手动配置什么参数=0啊,配置什么参数=TRUE之类的,都不用,直接用就行。
- 1. Go to: /etc/modprobe.d/
- 2. Make a file: blacklist-nvidia-nouveau.conf
- 3. Put this in the file:
- blacklist nouveau
- options nouveau modeset=0
- 4. Make a other “nvidia.conf” file and put this in the file: options nvidia NVreg_OpenRmEnableUnsupportedGpus=1
- 5. Updat kernel init ram fs: sudo update-initramfs -u
- 6. Reboot
- 7. Go to the Nvidia site to the page to download the driver that you want.
- 8. Copy the URL of the download butten and past it behind wget to download it to the current folder
- 9. Sudo chmod 700 the file
- 10. Run the install file: sudo .\filename.run
- a. For open source: sudo .\filename.run -m=kernel-open
- b. Watch out! See that the nvidia.conf file exist in the modprobe folder and you have rebooted (and run the update-initramfs commando). Then only the GTX/RTX/QUADRO cards wil work!
- 11. After the instalation reboot the server
- 12. Test with nvidia-smi
- 13. Great succes!
pciPassthru.use64bitMMIO配置为TRUE
VMkernel.Boot.disableACSCheck配置为TRUE
pciPassthru.64bitMMIOSizeGB配置为64
nano /etc/modprobe.d/blacklist.conf 在文件的最后添加以下内容后重启
- blacklist nouveau
- blacklist rivafb
- blacklist nvidiafb
- blacklist rivatv
- blacklist nv
如果你前面的步骤没有禁用uefi安全启动,安装驱动的时候会有下面的提示:会要求你输入两遍密码,记住输入的密码,过一会重启以后要用。
重启以后会出现下面这样的界面,选择第二个Enroll MOK,选择continue,一直到要你输入密码后reboot
都安装好以后如果提示找不到设备,可以敲以下命令进行排查
nvidia-smi 这个现在我执行是提示没有设备
- root@vgpu1:~# lspci | grep NVIDIA
- 03:00.0 VGA compatible controller: NVIDIA Corporation GA102 [GeForce RTX 3090] (rev a1)
- 03:00.1 Audio device: NVIDIA Corporation GA102 High Definition Audio Controller (rev a1)
- 03:01.0 VGA compatible controller: NVIDIA Corporation GA102 [GeForce RTX 3090] (rev a1)
- 03:01.1 Audio device: NVIDIA Corporation GA102 High Definition Audio Controller (rev a1)
- root@vgpu1:~# lsmod | grep nvidia
- nvidia_uvm 4677632 0
- nvidia_drm 98304 0
- nvidia_modeset 1343488 1 nvidia_drm
- nvidia 54030336 2 nvidia_uvm,nvidia_modeset
- drm_kms_helper 200704 2 vmwgfx,nvidia_drm
- drm 581632 8 vmwgfx,drm_kms_helper,nvidia,drm_ttm_helper,nvidia_drm,ttm
- root@vgpu1:~# dmesg | grep nvidia
- [ 6.244921] audit: type=1400 audit(1723624822.923:4): apparmor="STATUS" operation="profile_load" profile="unconfined" name="nvidia_modprobe" pid=785 comm="apparmor_parser"
- [ 6.244926] audit: type=1400 audit(1723624822.923:5): apparmor="STATUS" operation="profile_load" profile="unconfined" name="nvidia_modprobe//kmod" pid=785 comm="apparmor_parser"
- [ 6.898620] nvidia: loading out-of-tree module taints kernel.
- [ 6.898632] nvidia: module license 'NVIDIA' taints kernel.
- [ 7.046885] nvidia-nvlink: Nvlink Core is being initialized, major device number 236
- [ 7.048732] nvidia 0000:03:00.0: enabling device (0000 -> 0003)
- [ 7.049766] nvidia 0000:03:00.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=none:owns=none
- [ 7.099483] nvidia 0000:03:01.0: enabling device (0000 -> 0003)
- [ 7.100327] nvidia 0000:03:01.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=none:owns=none
- [ 7.220493] nvidia-modeset: Loading NVIDIA Kernel Mode Setting Driver for UNIX platforms 550.54.15 Tue Mar 5 21:59:57 UTC 2024
- [ 7.257032] [drm] [nvidia-drm] [GPU ID 0x00000300] Loading driver
- [ 8.143173] [drm:nv_drm_load [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000300] Failed to allocate NvKmsKapiDevice
- [ 8.143332] [drm:nv_drm_register_drm_device [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000300] Failed to register device
- [ 8.143448] [drm] [nvidia-drm] [GPU ID 0x00000301] Loading driver
- [ 9.024981] [drm:nv_drm_load [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000301] Failed to allocate NvKmsKapiDevice
- [ 9.025120] [drm:nv_drm_register_drm_device [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000301] Failed to register device
- [ 9.111670] nvidia_uvm: module uses symbols nvUvmInterfaceDisableAccessCntr from proprietary module nvidia, inheriting taint.
- [ 9.141323] nvidia-uvm: Loaded the UVM driver, major device number 234.
- root@vgpu1:~# lshw -c video
- *-display
- description: VGA compatible controller
- product: SVGA II Adapter
- vendor: VMware
- physical id: f
- bus info: pci@0000:00:0f.0
- logical name: /dev/fb0
- version: 00
- width: 32 bits
- clock: 33MHz
- capabilities: vga_controller bus_master cap_list rom fb
- configuration: depth=32 driver=vmwgfx latency=64 resolution=1176,885
- resources: irq:16 ioport:840(size=16) memory:f0000000-f7ffffff memory:ff000000-ff7fffff memory:c0000-dffff
- *-display:0
- description: VGA compatible controller
- product: GA102 [GeForce RTX 3090]
- vendor: NVIDIA Corporation
- physical id: 0
- bus info: pci@0000:03:00.0
- version: a1
- width: 64 bits
- clock: 33MHz
- capabilities: pm msi pciexpress vga_controller bus_master cap_list
- configuration: driver=nvidia latency=248
- resources: irq:17 memory:fc000000-fcffffff memory:c0000000-cfffffff memory:d2000000-d3ffffff ioport:d80(size=128)
- *-display:1
- description: VGA compatible controller
- product: GA102 [GeForce RTX 3090]
- vendor: NVIDIA Corporation
- physical id: 26
- bus info: pci@0000:03:01.0
- version: a1
- width: 64 bits
- clock: 33MHz
- capabilities: pm msi pciexpress vga_controller bus_master cap_list
- configuration: driver=nvidia latency=248
- resources: irq:18 memory:fb000000-fbffffff memory:b0000000-bfffffff memory:d0000000-d1ffffff ioport:d00(size=128)
- root@vgpu1:~# lspci | grep NVIDIA
- 03:00.0 VGA compatible controller: NVIDIA Corporation GA102 [GeForce RTX 3090] (rev a1)
- 03:00.1 Audio device: NVIDIA Corporation GA102 High Definition Audio Controller (rev a1)
- 03:01.0 VGA compatible controller: NVIDIA Corporation GA102 [GeForce RTX 3090] (rev a1)
- 03:01.1 Audio device: NVIDIA Corporation GA102 High Definition Audio Controller (rev a1)
-
sudo journalctl -b
如果需要卸载旧的驱动或者删除有问题的驱动,可以尝试将下面的内容存为shell 脚本去执行
- sudo nvidia-uninstall
- sudo apt purge -y '^nvidia-*' '^libnvidia-*'
- sudo rm -r /var/lib/dkms/nvidia
- sudo apt -y autoremove
- sudo update-initramfs -c -k `uname -r`
- sudo update-grub2
- read -p "Press any key to reboot... " -n1 -s
- sudo reboot
参考文章:https://linuxconfig.org/how-to-install-the-nvidia-drivers-on-manjaro-linux
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。