赞
踩
纸上谈兵了这么多,我们还是来做一下rdma的测试看看。公司正好有mellanox的网卡,网卡是
- [root@localhost ~]# lspci -vvv |grep Eth
- 01:00.0 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx]
- 01:00.1 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx]
[root@localhost ~]# cat /etc/redhat-release CentOS Linux release 7.9.2009 (Core) [root@localhost ~]# uname -r 3.10.0-1160.el7.x86_64
固件版本是
- [root@localhost bak]# flint -d /dev/mst/mt4117_pciconf0 -i fw-ConnectX4Lx-rel-14_31_1014-MCX4121A-ACA_Ax-UEFI-14.24.13-FlexBoot-3.6.403.bin burn
- Current FW version on flash: 14.23.1020
- New FW version: 14.31.1014
-
- FSMST_INITIALIZE - OK
- Writing Boot image component - OK
- -I- To load new FW run mlxfwreset or reboot machine.
mellanox的ofed下载地址如下:
https://network.nvidia.com/products/infiniband-drivers/linux/mlnx_ofed/
下载自己操作系统对应的版本
tar xvf MLNX_OFED_SRC-5.5-1.0.3.2.tgz cd MLNX_OFED_SRC-5.5-1.0.3.2/ ./install.pl
安装完之后,看到了GUID和若干PASS的状态
- [root@localhost MLNX_OFED_SRC-5.5-1.0.3.2]# hca_self_test.ofed
- ---- Performing Adapter Device Self Test ----
- Number of CAs Detected ................. 2
- PCI Device Check ....................... PASS
- Kernel Arch ............................ x86_64
- Host Driver Version .................... OFED-internal-5.5-1.0.3: 3.10.0-1160.el7.x86_64
- Host Driver RPM Check .................. PASS
- Firmware on CA #0 NIC .................. v14.23.1020
- Firmware on CA #1 NIC .................. v14.23.1020
- Host Driver Initialization ............. PASS
- Number of CA Ports Active .............. 0
- Port State of Port #1 on CA #0 (NIC)..... DOWN (Ethernet)
- Port State of Port #1 on CA #1 (NIC)..... DOWN (Ethernet)
- Error Counter Check on CA #0 (NIC)...... PASS
- Error Counter Check on CA #1 (NIC)...... PASS
- Kernel Syslog Check .................... PASS
- Node GUID on CA #0 (NIC) ............... 98:03:9b:03:00:48:bd:c8
- Node GUID on CA #1 (NIC) ............... 98:03:9b:03:00:48:bd:c9
可以输入一些命令查看ib的状态
- [root@localhost MLNX_OFED_SRC-5.5-1.0.3.2]# ibdev2netdev //查看以太网设备和IB设备/端口之间的关联
- mlx5_0 port 1 ==> eth1 (Down)
- mlx5_1 port 1 ==> eth2 (Down)
- [root@localhost MLNX_OFED_SRC-5.5-1.0.3.2]# ibv_devinfo
- hca_id: mlx5_0
- transport: InfiniBand (0) //IB协议
- fw_ver: 14.23.1020
- node_guid: 9803:9b03:0048:bdc8
- sys_image_guid: 9803:9b03:0048:bdc8
- vendor_id: 0x02c9
- vendor_part_id: 4117
- hw_ver: 0x0
- board_id: MT_2420110034
- phys_port_cnt: 1
- port: 1
- state: PORT_DOWN (1)
- max_mtu: 4096 (5)
- active_mtu: 1024 (3)
- sm_lid: 0
- port_lid: 0
- port_lmc: 0x00
- link_layer: Ethernet
- hca_id: mlx5_1
- transport: InfiniBand (0)
- fw_ver: 14.23.1020
- node_guid: 9803:9b03:0048:bdc9
- sys_image_guid: 9803:9b03:0048:bdc8
- vendor_id: 0x02c9
- vendor_part_id: 4117
- hw_ver: 0x0
- board_id: MT_2420110034
- phys_port_cnt: 1
- port: 1
- state: PORT_DOWN (1)
- max_mtu: 4096 (5)
- active_mtu: 1024 (3)
- sm_lid: 0
- port_lid: 0
- port_lmc: 0x00
- link_layer: Ethernet
从上面的打印来看,目前的state还是PORT_DOWN,而且link_layer不是IB模式,网上说要修改LINK_TYPE_P1为1(1是IB模式,2是ethernet模式)
[root@localhost ~]# mlxconfig -d /dev/mst/mt4117_pciconf0 query |grep LINK
但是没找到LINK_TYPE_P1这个选项。
怀疑是不是固件版本的问题
更新固件试试
网上查了一下,需要下一个MST的工具包
https://network.nvidia.com/products/adapter-software/firmware-tools/
- tar xvf mft-4.18.0-106-x86_64-rpm.tgz
- cd mft-4.18.0-106-x86_64-rpm/
- ./install.sh
- mst start
- service mst status
下载最新版本的固件
https://network.nvidia.com/support/firmware/connectx4lxen/
- [root@localhost bak]# flint -d /dev/mst/mt4117_pciconf0 -i fw-ConnectX4Lx-rel-14_31_1014-MCX4121A-ACA_Ax-UEFI-14.24.13-FlexBoot-3.6.403.bin burn
- Current FW version on flash: 14.23.1020
- New FW version: 14.31.1014
-
- FSMST_INITIALIZE - OK
- Writing Boot image component - OK
- -I- To load new FW run mlxfwreset or reboot machine.
没有效果
下载老一点的驱动,5.1的,替换5.5的驱动,还是不行
后来在这个网址看到如下信息:
https://access.redhat.com/articles/3082811
Note that the card in the example output is an Ethernet-only card, so there is no port type setting.
这里就提到了connect4x lx网卡是不支持IB的,但是为啥mlxconfig query又显示transport是IB呢,太奇怪了。
感觉无法做这个测试了。
transport: InfiniBand (0)
而且connect4x lx和connect4x都是mlx5芯片的 ,原生就应该支持IB,为啥要搞出个不支持rdma的板卡呢。
这个网址同样提到
Unfortunately, I'm starting to think that I have the wrong card (and that this only works for Ethernet), because I am unable to change the link type of this card to infiniband. I have followed all the instructions, but it says that the option (LINK_TYPE) isn't found when I try via the command line.
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。