当前位置:   article > 正文

RDMA技术浅析(三)_port state of port #1 on ca #0 (hca)..... down (in

port state of port #1 on ca #0 (hca)..... down (infiniband)

环境

纸上谈兵了这么多,我们还是来做一下rdma的测试看看。公司正好有mellanox的网卡,网卡是

  1. [root@localhost ~]# lspci -vvv |grep Eth
  2. 01:00.0 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx]
  3. 01:00.1 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx]

Linux版本

[root@localhost ~]# cat /etc/redhat-release CentOS Linux release 7.9.2009 (Core) [root@localhost ~]# uname -r 3.10.0-1160.el7.x86_64

固件版本是

  1. [root@localhost bak]# flint -d /dev/mst/mt4117_pciconf0 -i fw-ConnectX4Lx-rel-14_31_1014-MCX4121A-ACA_Ax-UEFI-14.24.13-FlexBoot-3.6.403.bin burn
  2. Current FW version on flash: 14.23.1020
  3. New FW version: 14.31.1014
  4. FSMST_INITIALIZE - OK
  5. Writing Boot image component - OK
  6. -I- To load new FW run mlxfwreset or reboot machine.

安装OFED

mellanox的ofed下载地址如下:

https://network.nvidia.com/products/infiniband-drivers/linux/mlnx_ofed/

下载自己操作系统对应的版本

tar xvf MLNX_OFED_SRC-5.5-1.0.3.2.tgz cd MLNX_OFED_SRC-5.5-1.0.3.2/ ./install.pl

安装完之后,看到了GUID和若干PASS的状态

  1. [root@localhost MLNX_OFED_SRC-5.5-1.0.3.2]# hca_self_test.ofed
  2. ---- Performing Adapter Device Self Test ----
  3. Number of CAs Detected ................. 2
  4. PCI Device Check ....................... PASS
  5. Kernel Arch ............................ x86_64
  6. Host Driver Version .................... OFED-internal-5.5-1.0.3: 3.10.0-1160.el7.x86_64
  7. Host Driver RPM Check .................. PASS
  8. Firmware on CA #0 NIC .................. v14.23.1020
  9. Firmware on CA #1 NIC .................. v14.23.1020
  10. Host Driver Initialization ............. PASS
  11. Number of CA Ports Active .............. 0
  12. Port State of Port #1 on CA #0 (NIC)..... DOWN (Ethernet)
  13. Port State of Port #1 on CA #1 (NIC)..... DOWN (Ethernet)
  14. Error Counter Check on CA #0 (NIC)...... PASS
  15. Error Counter Check on CA #1 (NIC)...... PASS
  16. Kernel Syslog Check .................... PASS
  17. Node GUID on CA #0 (NIC) ............... 98:03:9b:03:00:48:bd:c8
  18. Node GUID on CA #1 (NIC) ............... 98:03:9b:03:00:48:bd:c9

可以输入一些命令查看ib的状态

  1. [root@localhost MLNX_OFED_SRC-5.5-1.0.3.2]# ibdev2netdev //查看以太网设备和IB设备/端口之间的关联
  2. mlx5_0 port 1 ==> eth1 (Down)
  3. mlx5_1 port 1 ==> eth2 (Down)
  4. [root@localhost MLNX_OFED_SRC-5.5-1.0.3.2]# ibv_devinfo
  5. hca_id: mlx5_0
  6. transport: InfiniBand (0) //IB协议
  7. fw_ver: 14.23.1020
  8. node_guid: 9803:9b03:0048:bdc8
  9. sys_image_guid: 9803:9b03:0048:bdc8
  10. vendor_id: 0x02c9
  11. vendor_part_id: 4117
  12. hw_ver: 0x0
  13. board_id: MT_2420110034
  14. phys_port_cnt: 1
  15. port: 1
  16. state: PORT_DOWN (1)
  17. max_mtu: 4096 (5)
  18. active_mtu: 1024 (3)
  19. sm_lid: 0
  20. port_lid: 0
  21. port_lmc: 0x00
  22. link_layer: Ethernet
  23. hca_id: mlx5_1
  24. transport: InfiniBand (0)
  25. fw_ver: 14.23.1020
  26. node_guid: 9803:9b03:0048:bdc9
  27. sys_image_guid: 9803:9b03:0048:bdc8
  28. vendor_id: 0x02c9
  29. vendor_part_id: 4117
  30. hw_ver: 0x0
  31. board_id: MT_2420110034
  32. phys_port_cnt: 1
  33. port: 1
  34. state: PORT_DOWN (1)
  35. max_mtu: 4096 (5)
  36. active_mtu: 1024 (3)
  37. sm_lid: 0
  38. port_lid: 0
  39. port_lmc: 0x00
  40. link_layer: Ethernet

从上面的打印来看,目前的state还是PORT_DOWN,而且link_layer不是IB模式,网上说要修改LINK_TYPE_P1为1(1是IB模式,2是ethernet模式)

[root@localhost ~]# mlxconfig -d /dev/mst/mt4117_pciconf0 query |grep LINK

但是没找到LINK_TYPE_P1这个选项。

怀疑是不是固件版本的问题

更新固件试试

网上查了一下,需要下一个MST的工具包

https://network.nvidia.com/products/adapter-software/firmware-tools/

  1. tar xvf mft-4.18.0-106-x86_64-rpm.tgz
  2. cd mft-4.18.0-106-x86_64-rpm/
  3. ./install.sh
  4. mst start
  5. service mst status

下载最新版本的固件

https://network.nvidia.com/support/firmware/connectx4lxen/

  1. [root@localhost bak]# flint -d /dev/mst/mt4117_pciconf0 -i fw-ConnectX4Lx-rel-14_31_1014-MCX4121A-ACA_Ax-UEFI-14.24.13-FlexBoot-3.6.403.bin burn
  2. Current FW version on flash: 14.23.1020
  3. New FW version: 14.31.1014
  4. FSMST_INITIALIZE - OK
  5. Writing Boot image component - OK
  6. -I- To load new FW run mlxfwreset or reboot machine.

没有效果

下载老一点的驱动,5.1的,替换5.5的驱动,还是不行

后来在这个网址看到如下信息:

https://access.redhat.com/articles/3082811

Note that the card in the example output is an Ethernet-only card, so there is no port type setting.

这里就提到了connect4x lx网卡是不支持IB的,但是为啥mlxconfig query又显示transport是IB呢,太奇怪了。

感觉无法做这个测试了。

transport: InfiniBand (0)

而且connect4x lx和connect4x都是mlx5芯片的 ,原生就应该支持IB,为啥要搞出个不支持rdma的板卡呢。

https://mymellanox.force.com/mellanoxcommunity/s/question/0D51T00008dGyJMSA0/how-to-use-mellanox-connectx4-lx

这个网址同样提到

Unfortunately, I'm starting to think that I have the wrong card (and that this only works for Ethernet), because I am unable to change the link type of this card to infiniband. I have followed all the instructions, but it says that the option (LINK_TYPE) isn't found when I try via the command line.​

声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/很楠不爱3/article/detail/470813
推荐阅读
相关标签
  

闽ICP备14008679号