赞
踩
1: 在BIOS中打开硬件辅助虚拟化功能⽀持
对于intel cpu, 在主板中开启VT-x及VT-d选项
VT-x为开启虚拟化需要
VT-d为开启PCI passthrough
这两个选项⼀般在BIOS中Advance下CPU和System或相关条⽬中设置,例如:
VT: Intel Virtualization Technology
VT-d: Intel VT for Directed I/O
对于 amd cpu, 在主板中开启SVM及IOMMU选项
SVM为开启虚拟化需要
IOMMU为开启PCI passthrough
2:确认内核⽀持iommu
cat /proc/cmdline | grep iommu 如果没有输出, 则需要修改kernel启动参数 对于intel cpu 1. 编辑 /etc/default/grub ⽂件, 在 GRUB_CMDLINE_LINUX ⾏后⾯添加: intel_iommu=on 例如: GRUB_CMDLINE_LINUX="crashkernel=auto rd.lvm.lv=centos/root rd.lvm.lv=centos/swap rhgb quiet intel_iommu=on" 如果没有 GRUB_CMDLINE_LINUX , 则使⽤ GRUB_CMDLINE_LINUX_DEFAULT 2. 更新grub grub2-mkconfig -o /boot/grub2/grub.cfg 如果是uefi启动,需要修改启动文件,如下: grub2-mkconfig -o /boot/efi/EFI/centos/grub.cfg reboot 重启机器 对于amd cpu 与intel cpu的区别为, 添加的不是 intel_iommu=on , ⽽是 iommu=on , 其他步骤⼀样
3:从默认驱动程序解绑网卡(如果设备之前没有提供其他程序使用可忽略)
echo "8086 10fb" > /sys/bus/pci/drivers/pci-stub/new_id
echo "0000:81:00.0" > /sys/bus/pci/devices/0000:81:00.0/driver/unbind
echo "0000:81:00.0" > /sys/bus/pci/drivers/pci-stub/bind
至此,准备透传的网卡已准备就绪。
4:确认pci设备驱动信息
[root@compute01 ~]# lspci -nn | grep -i Eth
1a:00.0 Ethernet controller [0200]: Intel Corporation Ethernet Connection X722 for 10GbE SFP+ [8086:37d0] (rev 09)
1a:00.1 Ethernet controller [0200]: Intel Corporation Ethernet Connection X722 for 10GbE SFP+ [8086:37d0] (rev 09)
1a:00.2 Ethernet controller [0200]: Intel Corporation Ethernet Connection X722 for 1GbE [8086:37d1] (rev 09)
1a:00.3 Ethernet controller [0200]: Intel Corporation Ethernet Connection X722 for 1GbE [8086:37d1] (rev 09)
3b:00.0 Ethernet controller [0200]: Intel Corporation Ethernet Controller X710 for 10GbE SFP+ [8086:1572] (rev 02)
3b:00.1 Ethernet controller [0200]: Intel Corporation Ethernet Controller X710 for 10GbE SFP+ [8086:1572] (rev 02)
3c:00.0 Ethernet controller [0200]: Intel Corporation Ethernet Controller X710 for 10GbE SFP+ [8086:1572] (rev 02)
3c:00.1 Ethernet controller [0200]: Intel Corporation Ethernet Controller X710 for 10GbE SFP+ [8086:1572] (rev 02)
5e:00.0 Ethernet controller [0200]: Intel Corporation 82599ES 10-Gigabit SFI/SFP+ Network Connection [8086:10fb] (rev 01)
5e:00.1 Ethernet controller [0200]: Intel Corporation 82599ES 10-Gigabit SFI/SFP+ Network Connection [8086:10fb] (rev 01)
[root@compute01 ~]# lspci -v -s 1a:00.0 ##查看pci设备的具体信息
[8086:10fb] verdor ID为8086 project ID为10fb
配置openstack,如下:
1:配置nova-scheduler
在filter_scheduler中加⼊ PciPassthroughFilter , 同时添加 available_filters =
nova.scheduler.filters.all_filters
[filter_scheduler]
host_subset_size = 10
max_io_ops_per_host = 10
enabled_filters = RetryFilter,AvailabilityZoneFilter,ComputeFilter,ComputeCapabilitiesFilter,ImagePropertiesFilter,ServerGroupAntiAffinityFilter,ServerGroupAffinityFilter,AggregateCoreFilter,AggregateDiskFilter,DifferentHostFilter,SameHostFilter,PciPassthroughFilter
available_filters = nova.scheduler.filters.all_filters
2:配置nova-api
添加新的块pci
[pci]
alias = {"vendor_id":"8086","product_id":"10fb","device_type":"type-PCI","name":"a1"}
重启api以及scheduler容器
docker nova-api nova-scheduler
3:配置透传设备所在的计算节点
[pci]
passthrough_whitelist = { "vendor_id":"8086","product_id":"10fb" }
alias = { "vendor_id":"8086", "product_id":"10fb", "device_type":"type-PCI", "name":"a1" }
重启nova-compute服务
注意:如果是网卡"device_type":“type-PF”。如果是gpu则"device_type":“type-PCI”
4:创建带pci标签的flavor
openstack flavor set ml.large --property "pci_passthrough:alias"="a1:1"
使⽤该flavor创建虚拟机, 虚拟机会⾃动调度到透传设备的节点上
openstack flavor set FLAVOR-NAME --property pci_passthrough:alias=ALIAS:COUNT
参考官网链接:
https://docs.openstack.org/nova/pike/admin/pci-passthrough.html
(常见型号:Quadro RTX 6000/8000)
解决方法如下: 将同一个pci槽位的所有设备都直通给一个虚拟机 如果只透传gpu,会出现以下报错 2020-01-14 23:24:01.468 14281 ERROR nova.virt.libvirt.guest [req-fe905189-9d2e-48a3-a848-82149a686c60 74caf2133c6cabb260b88f1a0eba7e0ef524f70eb00cd1f99a6585b9d5545572 836f840d0035448e9b90a9d8da3fd769 - 397d0639d4e9451b9ff85a3e9d73da43 397d0639d4e9451b9ff85a3e9d73da43] Error launching a defined domain with XML: <domain type='kvm'> 2020-01-14 23:24:01.469 14281 ERROR nova.virt.libvirt.driver [req-fe905189-9d2e-48a3-a848-82149a686c60 74caf2133c6cabb260b88f1a0eba7e0ef524f70eb00cd1f99a6585b9d5545572 836f840d0035448e9b90a9d8da3fd769 - 397d0639d4e9451b9ff85a3e9d73da43 397d0639d4e9451b9ff85a3e9d73da43] [instance: 2f315777-b4bd-4a81-b7cb-3ccbb28ddfc6] Failed to start libvirt guest: libvirtError: internal error: qemu unexpectedly closed the monitor: 2020-01-14T15:24:01.257459Z qemu-kvm: -device vfio-pci,host=06:00.0,id=hostdev0,bus=pci.0,addr=0x5: vfio error: 0000:06:00.0: group 46 is not viable 2020-01-14 23:24:01.641 14281 ERROR nova.compute.manager [req-fe905189-9d2e-48a3-a848-82149a686c60 74caf2133c6cabb260b88f1a0eba7e0ef524f70eb00cd1f99a6585b9d5545572 836f840d0035448e9b90a9d8da3fd769 - 397d0639d4e9451b9ff85a3e9d73da43 397d0639d4e9451b9ff85a3e9d73da43] [instance: 2f315777-b4bd-4a81-b7cb-3ccbb28ddfc6] Instance failed to spawn: libvirtError: internal error: qemu unexpectedly closed the monitor: 2020-01-14T15:24:01.257459Z qemu-kvm: -device vfio-pci,host=06:00.0,id=hostdev0,bus=pci.0,addr=0x5: vfio error: 0000:06:00.0: group 46 is not viable 2020-01-14 23:24:01.641 14281 ERROR nova.compute.manager [instance: 2f315777-b4bd-4a81-b7cb-3ccbb28ddfc6] Traceback (most recent call last): 2020-01-14 23:24:01.641 14281 ERROR nova.compute.manager [instance: 2f315777-b4bd-4a81-b7cb-3ccbb28ddfc6] File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 2274, in _build_resources 2020-01-14 23:24:01.641 14281 ERROR nova.compute.manager [instance: 2f315777-b4bd-4a81-b7cb-3ccbb28ddfc6] yield resources 2020-01-14 23:24:01.641 14281 ERROR nova.compute.manager [instance: 2f315777-b4bd-4a81-b7cb-3ccbb28ddfc6] File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 2054, in _build_and_run_instance 2020-01-14 23:24:01.641 14281 ERROR nova.compute.manager [instance: 2f315777-b4bd-4a81-b7cb-3ccbb28ddfc6] block_device_info=block_device_info) 2020-01-14 23:24:01.641 14281 ERROR nova.compute.manager [instance: 2f315777-b4bd-4a81-b7cb-3ccbb28ddfc6] File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 3170, in spawn 2020-01-14 23:24:01.641 14281 ERROR nova.compute.manager [instance: 2f315777-b4bd-4a81-b7cb-3ccbb28ddfc6] destroy_disks_on_failure=True) 2020-01-14 23:24:01.641 14281 ERROR nova.compute.manager [instance: 2f315777-b4bd-4a81-b7cb-3ccbb28ddfc6] File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 5674, in _create_domain_and_network 2020-01-14 23:24:01.641 14281 ERROR nova.compute.manager [instance: 2f315777-b4bd-4a81-b7cb-3ccbb28ddfc6] destroy_disks_on_failure) 2020-01-14 23:24:01.641 14281 ERROR nova.compute.manager [instance: 2f315777-b4bd-4a81-b7cb-3ccbb28ddfc6] File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 220, in __exit__ 2020-01-14 23:24:01.641 14281 ERROR nova.compute.manager [instance: 2f315777-b4bd-4a81-b7cb-3ccbb28ddfc6] return self._domain.createWithFlags(flags) 2020-01-14 23:24:01.641 14281 ERROR nova.compute.manager [instance: 2f315777-b4bd-4a81-b7cb-3ccbb28ddfc6] File "/usr/lib/python2.7/site-packages/eventlet/tpool.py", line 186, in doit 2020-01-14 23:24:01.641 14281 ERROR nova.compute.manager [instance: 2f315777-b4bd-4a81-b7cb-3ccbb28ddfc6] result = proxy_call(self._autowrap, f, *args, **kwargs) 2020-01-14 23:24:01.641 14281 ERROR nova.compute.manager [instance: 2f315777-b4bd-4a81-b7cb-3ccbb28ddfc6] File "/usr/lib/python2.7/site-packages/eventlet/tpool.py", line 144, in proxy_call 2020-01-14 23:24:01.641 14281 ERROR nova.compute.manager [instance: 2f315777-b4bd-4a81-b7cb-3ccbb28ddfc6] rv = execute(f, *args, **kwargs) 2020-01-14 23:24:01.641 14281 ERROR nova.compute.manager [instance: 2f315777-b4bd-4a81-b7cb-3ccbb28ddfc6] File "/usr/lib/python2.7/site-packages/eventlet/tpool.py", line 125, in execute 2020-01-14 23:24:01.641 14281 ERROR nova.compute.manager [instance: 2f315777-b4bd-4a81-b7cb-3ccbb28ddfc6] six.reraise(c, e, tb) 2020-01-14 23:24:01.641 14281 ERROR nova.compute.manager [instance: 2f315777-b4bd-4a81-b7cb-3ccbb28ddfc6] File "/usr/lib/python2.7/site-packages/eventlet/tpool.py", line 83, in tworker 2020-01-14 23:24:01.641 14281 ERROR nova.compute.manager [instance: 2f315777-b4bd-4a81-b7cb-3ccbb28ddfc6] rv = meth(*args, **kwargs) 2020-01-14 23:24:01.641 14281 ERROR nova.compute.manager [instance: 2f315777-b4bd-4a81-b7cb-3ccbb28ddfc6] File "/usr/lib64/python2.7/site-packages/libvirt.py", line 1110, in createWithFlags 2020-01-14 23:24:01.641 14281 ERROR nova.compute.manager [instance: 2f315777-b4bd-4a81-b7cb-3ccbb28ddfc6] if ret == -1: raise libvirtError ('virDomainCreateWithFlags() failed', dom=self) 2020-01-14 23:24:01.641 14281 ERROR nova.compute.manager [instance: 2f315777-b4bd-4a81-b7cb-3ccbb28ddfc6] libvirtError: internal error: qemu unexpectedly closed the monitor: 2020-01-14T15:24:01.257459Z qemu-kvm: -device vfio-pci,host=06:00.0,id=hostdev0,bus=pci.0,addr=0x5: vfio error: 0000:06:00.0: group 46 is not viable 2020-01-14 23:24:01.641 14281 ERROR nova.compute.manager [instance: 2f315777-b4bd-4a81-b7cb-3ccbb28ddfc6] Please ensure all devices within the iommu_group are bound to their vfio bus driver. 2020-01-14 23:24:01.641 14281 ERROR nova.compute.manager [instance: 2f315777-b4bd-4a81-b7cb-3ccbb28ddfc6]
具体系统iommu配置参考以上配置 #######
检查当前显卡设备信息
[root@ostack-228-26 ~]# lspci -nn | grep NVID
06:00.0 VGA compatible controller [0300]: NVIDIA Corporation Device [10de:1e04] (rev a1)
06:00.1 Audio device [0403]: NVIDIA Corporation Device [10de:10f7] (rev a1)
06:00.2 USB controller [0c03]: NVIDIA Corporation Device [10de:1ad6] (rev a1)
06:00.3 Serial bus controller [0c80]: NVIDIA Corporation Device [10de:1ad7] (rev a1)
#################################
可以看到,其实我的这台设备上有1个vga设备,这个pci设备一共有4个硬件:
VGA、Audio、USB、Serial bus
#################################
确认驱动
由于我们的物理服务器操作系统,并没有安装NVIDIA显卡驱动,所以我们会发现如下
信息。其中USB设备使用了xhci_hcd驱动,这个驱动是服务器自带的。
lspci -vv -s 06:00.0 | grep driver
lspci -vv -s 06:00.1 | grep driver
lspci -vv -s 06:00.2 | grep driver
Kernel driver in use: xhci_hcd
lspci -vv -s 06:00.3 | grep driver
如果我们安装了NVIDIA驱动的话, 可能会获得如下输出:
lspci -vv -s 06:00.0 | grep driver
Kernel driver in use: nvidia
lspci -vv -s 06:00.1 | grep driver
Kernel driver in use: snd_hda_intel
lspci -vv -s 06:00.2 | grep driver
Kernel driver in use: xhci_hcd
lspci -vv -s 06:00.3 | grep driver
#####################################
配置vfio驱动,如下:
配置系统加载模块
配置加载vfio-pci模块,编辑/etc/modules-load.d/openstack-gpu.conf,添加如下内容:
vfio_pci
pci_stub
vfio
vfio_iommu_type1
kvm
kvm_intel
###############################
配置vfio加载的设备
配置使用vfio驱动的设备(这里的设备就是上面我们查到的设备的)
编辑/etc/modprobe.d/vfio.conf,添加如下配置: options vfio-pci ids=10de:1e04,10de:10f7,10de:1ad6,10de:1ad7 ########################################## 重启系统 reboot ####################################### 查看启动信息,确认vfio模块是否加载 dmesg | grep -i vfio [ 6.755346] VFIO - User Level meta-driver version: 0.3 [ 6.803197] vfio_pci: add [10de:1b06[ffff:ffff]] class 0x000000/00000000 [ 6.803306] vfio_pci: add [10de:10ef[ffff:ffff]] class 0x000000/00000000 重启以后,我们查看设备使用的驱动,都显示vfio说明正确 lspci -vv -s 06:00.0 | grep driver Kernel driver in use: vfio-pci lspci -vv -s 06:00.1 | grep driver Kernel driver in use: vfio-pci lspci -vv -s 06:00.2 | grep driver Kernel driver in use: xhci_hcd lspci -vv -s 06:00.3 | grep driver Kernel driver in use: vfio-pci ################################ ############################################# 隐藏虚拟机的hypervisor ID 因为NIVIDIA显卡的驱动会检测是否跑在虚拟机里,如果在虚拟机里驱动就会出错,所以我们 需要对显卡驱动隐藏hypervisor id。在OpenStack的Pike版本中的Glance 镜像引入了 img_hide_hypervisor_id=true的property,所以可以对镜像执行如下的命令隐藏hupervisor id: openstack image set [IMG-UUID] --property img_hide_hypervisor_id=true ############################################# 启动实例。 ############################################# 通过此镜像安装的instance就会隐藏hypervisor id。 可以通过下边的命令查看hypervisor id是否隐藏: cpuid | grep hypervisor_id hypervisor_id = "KVMKVMKVM " hypervisor_id = "KVMKVMKVM " 上边的显示结果说明没有隐藏,下边的显示结果说明已经隐藏: cpuid | grep hypervisor_id hypervisor_id = " @ @ " hypervisor_id = " @ @ " ############################################# 编辑/etc/modprobe.d/vfio.conf,添加如下配置: options vfio-pci ids=10de:1e04,10de:10f7,10de:1ad6,10de:1ad7 ########################################## 重启系统 reboot ####################################### 查看启动信息,确认vfio模块是否加载 dmesg | grep -i vfio [ 6.755346] VFIO - User Level meta-driver version: 0.3 [ 6.803197] vfio_pci: add [10de:1b06[ffff:ffff]] class 0x000000/00000000 [ 6.803306] vfio_pci: add [10de:10ef[ffff:ffff]] class 0x000000/00000000 重启以后,我们查看设备使用的驱动,都显示vfio说明正确 lspci -vv -s 06:00.0 | grep driver Kernel driver in use: vfio-pci lspci -vv -s 06:00.1 | grep driver Kernel driver in use: vfio-pci lspci -vv -s 06:00.2 | grep driver Kernel driver in use: xhci_hcd lspci -vv -s 06:00.3 | grep driver Kernel driver in use: vfio-pci ################################ ############################################# 隐藏虚拟机的hypervisor ID 因为NIVIDIA显卡的驱动会检测是否跑在虚拟机里,如果在虚拟机里驱动就会出错,所以我们 需要对显卡驱动隐藏hypervisor id。在OpenStack的Pike版本中的Glance 镜像引入了 img_hide_hypervisor_id=true的property,所以可以对镜像执行如下的命令隐藏hupervisor id: openstack image set [IMG-UUID] --property img_hide_hypervisor_id=true ############################################# 启动实例。 ############################################# 通过此镜像安装的instance就会隐藏hypervisor id。 可以通过下边的命令查看hypervisor id是否隐藏: cpuid | grep hypervisor_id hyp
修改控制节点nova_api配置文件,nova_api配置,将其他三个pci设备添加进来,如下:
[pci]
alias = {"name":"a1","product_id":"1e04","vendor_id":"10de","device_type":"type-PCI"}
alias = {"name":"a2","product_id":"10f7","vendor_id":"10de","device_type":"type-PCI"}
alias = {"name":"a3","product_id":"1ad6","vendor_id":"10de","device_type":"type-PCI"}
alias = {"name":"a4","product_id":"1ad7","vendor_id":"10de","device_type":"type-PCI"}
修改nova-scheduler文件,添加PciPassthroughFilter,同时添加 available_filters =
nova.scheduler.filters.all_filters,如下:
[filter_scheduler]
host_subset_size = 10
max_io_ops_per_host = 10
enabled_filters = RetryFilter,AvailabilityZoneFilter,ComputeFilter,ComputeCapabilitiesFilter,ImagePropertiesFilter,ServerGroupAntiAffinityFilter,ServerGroupAffinityFilter,AggregateCoreFilter,AggregateDiskFilter,DifferentHostFilter,SameHostFilter,PciPassthroughFilter
available_filters = nova.scheduler.filters.all_filters
重启服务
systemctl restart openstack-nova-api openstack-nova-scheduler
配置计算节点,编辑nova.conf文件,如下:
[pci]
alias = {"name":"a1","product_id":"1e04","vendor_id":"10de","device_type":"type-PCI"}
alias = {"name":"a2","product_id":"10f7","vendor_id":"10de","device_type":"type-PCI"}
alias = {"name":"a3","product_id":"1ad6","vendor_id":"10de","device_type":"type-PCI"}
alias = {"name":"a4","product_id":"1ad7","vendor_id":"10de","device_type":"type-PCI"}
passthrough_whitelist = [{ "vendor_id": "10de", "product_id": "1e04" },
{ "vendor_id": "10de", "product_id": "10f7" },
{ "vendor_id": "10de", "product_id": "1ad6" },
{ "vendor_id": "10de", "product_id": "1ad7" }]
重启计算节点服务
docker restart nova_compute
创建带有显卡直通信息的flavor
openstack flavor create --ram 2048 --disk 20 --vcpus 2 m1.large
openstack flavor set m1.large --property pci_passthrough:alias='a1:1,a2:1,a3:1,a4:1'
#############################################
**具体系统内核配置请参照第一节,如上:**
1、T4显卡默认支持vgpu,所以默认是走的PF。在配置上有所不同,修改device_type为PF,如下:
修改控制节点nova-api文件如下:
[pci]
alias = {"vendor_id":"8086","product_id":"10fb","device_type":"type-PF","name":"a1"}
修改nova-scheduler文件,添加PciPassthroughFilter,同时添加 available_filters =
nova.scheduler.filters.all_filters,PCI的配置项,如下:
[filter_scheduler]
host_subset_size = 10
max_io_ops_per_host = 10
enabled_filters = RetryFilter,AvailabilityZoneFilter,ComputeFilter,ComputeCapabilitiesFilter,ImagePropertiesFilter,ServerGroupAntiAffinityFilter,ServerGroupAffinityFilter,AggregateCoreFilter,AggregateDiskFilter,DifferentHostFilter,SameHostFilter,PciPassthroughFilter
available_filters = nova.scheduler.filters.all_filters
[pci]
alias = {"vendor_id":"8086","product_id":"10fb","device_type":"type-PF","name":"a1"}
重启服务,如下:
docker restart nova_api nova_scheduler
修改计算节点nova.conf配置如下:
[pci]
passthrough_whitelist = { "vendor_id":"8086","product_id":"10fb" }
alias = { "vendor_id":"8086", "product_id":"10fb", "device_type":"type-PF", "name":"a1" }
重启nova-compute服务
docker restart nova_compute ##########数据库查看pic_devices如下:################## MariaDB [nova]> select * from pci_devices\G *************************** 1. row *************************** created_at: 2020-12-11 08:05:35 updated_at: 2020-12-11 08:07:10 deleted_at: NULL deleted: 0 id: 18 compute_node_id: 3 address: 0000:d8:00.0 product_id: 1eb8 vendor_id: 10de dev_type: type-PF dev_id: pci_0000_d8_00_0 label: label_10de_1eb8 status: available extra_info: {} instance_uuid: NULL request_id: NULL numa_node: 1 parent_addr: NULL uuid: 68043d3f-153b-4be6-b341-9f02f8fe7ffd 1 row in set (0.000 sec) #########################################
确认gpu驱动为vfio-pci
###################################### 禁用系统默认安装的 nouveau 驱动,修改/etc/modprobe.d/blacklist.conf 文件: echo -e "blacklist nouveau\noptions nouveau modeset=0" > /etc/modprobe.d/blacklist.conf ###################################### [root@sxjn-icontron01 ~]# [root@sxjn-icontron01 ~]# lspci -vv -s d8:00.0 | grep driver Kernel driver in use: pci-stub [root@sxjn-icontron01 ~]# ####本环境为非vfio-pci,需要修改为vfio-pci, ##########修改方法如下########### 配置加载vfio-pci模块,编辑/etc/modules-load.d/openstack-gpu.conf,添加如下内容: vfio_pci pci_stub vfio vfio_iommu_type1 kvm kvm_intel ############################### 编辑/etc/modprobe.d/vfio.conf,添加如下配置: options vfio-pci ids=10de:1e04 ########################################## 重启系统 reboot ####################################### 查看启动信息,确认vfio模块是否加载 [root@sxjn-icontron01 ~]# [root@sxjn-icontron01 ~]# lspci -vv -s d8:00.0 | grep driver Kernel driver in use: vfio-pci [root@sxjn-icontron01 ~]#
为创建的flavor设置metadata,如下:
openstack flavor set ml.large --property "pci_passthrough:alias"="a1:1"
使用flavor创建虚拟机
首先要透传gpu到虚拟机中,具体透传步骤参考上述步骤:
虚拟机中具体步骤如下:
1、系统配置如下:
安装基础安装包 yum install dkms gcc kernel-devel kernel-headers ###与内核版本保持一致,不然之后安装驱动加载模块失败 ###################################### 禁用系统默认安装的 nouveau 驱动,修改/etc/modprobe.d/blacklist.conf 文件: echo -e "blacklist nouveau\noptions nouveau modeset=0" > /etc/modprobe.d/blacklist.conf ###################################### ###################################### 备份原来的镜像文件 mv /boot/initramfs-$(uname -r).img /boot/initramfs-$(uname -r).img.bak 重建新的镜像文件 dracut /boot/initramfs-$(uname -r).img $(uname -r) 重启系统 reboot # 查看nouveau是否启动,如果结果为空即为禁用成功 lsmod | grep nouveau #####################################
2、安装gpu驱动,如下:
sh NVIDIA-Linux-x86_64-450.80.02.run --kernel-source-path=/usr/src/kernels/3.10.0-514.el7.x86_64 -k $(uname -r) --dkms -s -no-x-check -no-nouveau-check -no-opengl-files 下载与内核版本对应的驱动,通过此链接选择驱动 https://www.nvidia.cn/Download/index.aspx?lang=cn 安装完驱动之后执行如下命令,确认正确: [root@gpu-3 nvdia-docker]# nvidia-smi Tue Dec 15 21:51:35 2020 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 450.80.02 Driver Version: 450.80.02 CUDA Version: 11.0 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |===============================+======================+======================| | 0 Tesla T4 Off | 00000000:00:06.0 Off | 0 | | N/A 73C P0 29W / 70W | 0MiB / 15109MiB | 0% Default | | | | N/A | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=============================================================================| | No running processes found | +-----------------------------------------------------------------------------+
3、安装cuda-deriver以及cuda,本环境安装的是cuda-11.1,nvidia是455.45.01
yum install cuda cuda-drivers nvidia-driver-latest-dkms
###########################################################
在本地先下载好离线安转包,
参考链接:
https://developer.nvidia.com/cuda-downloads?target_os=Linux&target_arch=x86_64&target_distro=CentOS&target_version=7&target_type=rpmlocal
缺少的安装包可在此网站下载:
https://developer.download.nvidia.com/compute/cuda/repos/rhel7/x86_64/
###########################################################
4、安装docker
yum install docker-ce nvidia-docker2 ###nvidia-docker2版本不能太低 编辑daemon.json文件,确认配置如下: [root@gpu-2 nvidia]# vim /etc/docker/daemon.json { "runtimes": { "nvidia": { "path": "nvidia-container-runtime", "runtimeArgs": [] } } } [root@gpu-1 cuda]# systemctl daemon-reload [root@gpu-1 cuda]# systemctl restart docker
5、下载带有cuda驱动的image,进行测试;
docker pull nvidia/cuda:11.0-base ##下载docker镜像 [root@gpu-1 cuda]# nvidia-docker run -it image_id /bin/bash root@f244e0a31a90:/# nvidia-smi Wed Dec 16 03:11:54 2020 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 455.45.01 Driver Version: 455.45.01 CUDA Version: 11.1 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |===============================+======================+======================| | 0 Tesla T4 Off | 00000000:00:06.0 Off | 0 | | N/A 56C P0 19W / 70W | 0MiB / 15109MiB | 0% Default | | | | N/A | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=============================================================================| | No running processes found | +-----------------------------------------------------------------------------+ root@f244e0a31a90:/# root@f244e0a31a90:/# exit
**1:lsusb查看usb设备信息,如下:** ################################## Bus 001 Device 002: ID 8087:0020 Intel Corp. Integrated Rate Matching Hub Bus 002 Device 002: ID 8087:0020 Intel Corp. Integrated Rate Matching Hub Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub Bus 002 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub Bus 002 Device 004: ID 0930:6544 Toshiba Corp. Kingston DataTraveler 2.0 Stick (2GB) 2:编辑usb.xml文件,方法一: <hostdev mode='subsystem' type='usb'> <source> <vendor id='0930'/> ####verdon ID <product id='6544'/> ####product ID </source> </hostdev> 编辑usb.xml文件,方法二: <hostdev mode='subsystem' type='usb'> <source> <address bus='002' device='004'/> ####usb的address地址 </source> </hostdev> 3;attach设备到虚拟机 virsh attach-device instance-00000001 usb.xml 登录虚拟机里面就可以看到相应的usb设备,看到虚拟机识别到u盘为sdb。 4:卸载设备 virsh detach-device instance-00000001 usb.xml
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。