1.. SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB 2 3================================================= 4Mellanox ConnectX(R) mlx5 core VPI Network Driver 5================================================= 6 7Copyright (c) 2019, Mellanox Technologies LTD. 8 9Contents 10======== 11 12- `Enabling the driver and kconfig options`_ 13- `Devlink info`_ 14- `Devlink parameters`_ 15- `Devlink health reporters`_ 16- `mlx5 tracepoints`_ 17 18Enabling the driver and kconfig options 19================================================ 20 21| mlx5 core is modular and most of the major mlx5 core driver features can be selected (compiled in/out) 22| at build time via kernel Kconfig flags. 23| Basic features, ethernet net device rx/tx offloads and XDP, are available with the most basic flags 24| CONFIG_MLX5_CORE=y/m and CONFIG_MLX5_CORE_EN=y. 25| For the list of advanced features please see below. 26 27**CONFIG_MLX5_CORE=(y/m/n)** (module mlx5_core.ko) 28 29| The driver can be enabled by choosing CONFIG_MLX5_CORE=y/m in kernel config. 30| This will provide mlx5 core driver for mlx5 ulps to interface with (mlx5e, mlx5_ib). 31 32 33**CONFIG_MLX5_CORE_EN=(y/n)** 34 35| Choosing this option will allow basic ethernet netdevice support with all of the standard rx/tx offloads. 36| mlx5e is the mlx5 ulp driver which provides netdevice kernel interface, when chosen, mlx5e will be 37| built-in into mlx5_core.ko. 38 39 40**CONFIG_MLX5_EN_ARFS=(y/n)** 41 42| Enables Hardware-accelerated receive flow steering (arfs) support, and ntuple filtering. 43| https://community.mellanox.com/s/article/howto-configure-arfs-on-connectx-4 44 45 46**CONFIG_MLX5_EN_RXNFC=(y/n)** 47 48| Enables ethtool receive network flow classification, which allows user defined 49| flow rules to direct traffic into arbitrary rx queue via ethtool set/get_rxnfc API. 50 51 52**CONFIG_MLX5_CORE_EN_DCB=(y/n)**: 53 54| Enables `Data Center Bridging (DCB) Support <https://community.mellanox.com/s/article/howto-auto-config-pfc-and-ets-on-connectx-4-via-lldp-dcbx>`_. 55 56 57**CONFIG_MLX5_MPFS=(y/n)** 58 59| Ethernet Multi-Physical Function Switch (MPFS) support in ConnectX NIC. 60| MPFs is required for when `Multi-Host <http://www.mellanox.com/page/multihost>`_ configuration is enabled to allow passing 61| user configured unicast MAC addresses to the requesting PF. 62 63 64**CONFIG_MLX5_ESWITCH=(y/n)** 65 66| Ethernet SRIOV E-Switch support in ConnectX NIC. E-Switch provides internal SRIOV packet steering 67| and switching for the enabled VFs and PF in two available modes: 68| 1) `Legacy SRIOV mode (L2 mac vlan steering based) <https://community.mellanox.com/s/article/howto-configure-sr-iov-for-connectx-4-connectx-5-with-kvm--ethernet-x>`_. 69| 2) `Switchdev mode (eswitch offloads) <https://www.mellanox.com/related-docs/prod_software/ASAP2_Hardware_Offloading_for_vSwitches_User_Manual_v4.4.pdf>`_. 70 71 72**CONFIG_MLX5_CORE_IPOIB=(y/n)** 73 74| IPoIB offloads & acceleration support. 75| Requires CONFIG_MLX5_CORE_EN to provide an accelerated interface for the rdma 76| IPoIB ulp netdevice. 77 78 79**CONFIG_MLX5_FPGA=(y/n)** 80 81| Build support for the Innova family of network cards by Mellanox Technologies. 82| Innova network cards are comprised of a ConnectX chip and an FPGA chip on one board. 83| If you select this option, the mlx5_core driver will include the Innova FPGA core and allow 84| building sandbox-specific client drivers. 85 86 87**CONFIG_MLX5_EN_IPSEC=(y/n)** 88 89| Enables `IPSec XFRM cryptography-offload accelaration <http://www.mellanox.com/related-docs/prod_software/Mellanox_Innova_IPsec_Ethernet_Adapter_Card_User_Manual.pdf>`_. 90 91**CONFIG_MLX5_EN_TLS=(y/n)** 92 93| TLS cryptography-offload accelaration. 94 95 96**CONFIG_MLX5_INFINIBAND=(y/n/m)** (module mlx5_ib.ko) 97 98| Provides low-level InfiniBand/RDMA and `RoCE <https://community.mellanox.com/s/article/recommended-network-configuration-examples-for-roce-deployment>`_ support. 99 100 101**External options** ( Choose if the corresponding mlx5 feature is required ) 102 103- CONFIG_PTP_1588_CLOCK: When chosen, mlx5 ptp support will be enabled 104- CONFIG_VXLAN: When chosen, mlx5 vxaln support will be enabled. 105- CONFIG_MLXFW: When chosen, mlx5 firmware flashing support will be enabled (via devlink and ethtool). 106 107Devlink info 108============ 109 110The devlink info reports the running and stored firmware versions on device. 111It also prints the device PSID which represents the HCA board type ID. 112 113User command example:: 114 115 $ devlink dev info pci/0000:00:06.0 116 pci/0000:00:06.0: 117 driver mlx5_core 118 versions: 119 fixed: 120 fw.psid MT_0000000009 121 running: 122 fw.version 16.26.0100 123 stored: 124 fw.version 16.26.0100 125 126Devlink parameters 127================== 128 129flow_steering_mode: Device flow steering mode 130--------------------------------------------- 131The flow steering mode parameter controls the flow steering mode of the driver. 132Two modes are supported: 1331. 'dmfs' - Device managed flow steering. 1342. 'smfs - Software/Driver managed flow steering. 135 136In DMFS mode, the HW steering entities are created and managed through the 137Firmware. 138In SMFS mode, the HW steering entities are created and managed though by 139the driver directly into Hardware without firmware intervention. 140 141SMFS mode is faster and provides better rule inserstion rate compared to default DMFS mode. 142 143User command examples: 144 145- Set SMFS flow steering mode:: 146 147 $ devlink dev param set pci/0000:06:00.0 name flow_steering_mode value "smfs" cmode runtime 148 149- Read device flow steering mode:: 150 151 $ devlink dev param show pci/0000:06:00.0 name flow_steering_mode 152 pci/0000:06:00.0: 153 name flow_steering_mode type driver-specific 154 values: 155 cmode runtime value smfs 156 157 158Devlink health reporters 159======================== 160 161tx reporter 162----------- 163The tx reporter is responsible for reporting and recovering of the following two error scenarios: 164 165- TX timeout 166 Report on kernel tx timeout detection. 167 Recover by searching lost interrupts. 168- TX error completion 169 Report on error tx completion. 170 Recover by flushing the TX queue and reset it. 171 172TX reporter also support on demand diagnose callback, on which it provides 173real time information of its send queues status. 174 175User commands examples: 176 177- Diagnose send queues status:: 178 179 $ devlink health diagnose pci/0000:82:00.0 reporter tx 180 181NOTE: This command has valid output only when interface is up, otherwise the command has empty output. 182 183- Show number of tx errors indicated, number of recover flows ended successfully, 184 is autorecover enabled and graceful period from last recover:: 185 186 $ devlink health show pci/0000:82:00.0 reporter tx 187 188rx reporter 189----------- 190The rx reporter is responsible for reporting and recovering of the following two error scenarios: 191 192- RX queues initialization (population) timeout 193 RX queues descriptors population on ring initialization is done in 194 napi context via triggering an irq, in case of a failure to get 195 the minimum amount of descriptors, a timeout would occur and it 196 could be recoverable by polling the EQ (Event Queue). 197- RX completions with errors (reported by HW on interrupt context) 198 Report on rx completion error. 199 Recover (if needed) by flushing the related queue and reset it. 200 201RX reporter also supports on demand diagnose callback, on which it 202provides real time information of its receive queues status. 203 204- Diagnose rx queues status, and corresponding completion queue:: 205 206 $ devlink health diagnose pci/0000:82:00.0 reporter rx 207 208NOTE: This command has valid output only when interface is up, otherwise the command has empty output. 209 210- Show number of rx errors indicated, number of recover flows ended successfully, 211 is autorecover enabled and graceful period from last recover:: 212 213 $ devlink health show pci/0000:82:00.0 reporter rx 214 215fw reporter 216----------- 217The fw reporter implements diagnose and dump callbacks. 218It follows symptoms of fw error such as fw syndrome by triggering 219fw core dump and storing it into the dump buffer. 220The fw reporter diagnose command can be triggered any time by the user to check 221current fw status. 222 223User commands examples: 224 225- Check fw heath status:: 226 227 $ devlink health diagnose pci/0000:82:00.0 reporter fw 228 229- Read FW core dump if already stored or trigger new one:: 230 231 $ devlink health dump show pci/0000:82:00.0 reporter fw 232 233NOTE: This command can run only on the PF which has fw tracer ownership, 234running it on other PF or any VF will return "Operation not permitted". 235 236fw fatal reporter 237----------------- 238The fw fatal reporter implements dump and recover callbacks. 239It follows fatal errors indications by CR-space dump and recover flow. 240The CR-space dump uses vsc interface which is valid even if the FW command 241interface is not functional, which is the case in most FW fatal errors. 242The recover function runs recover flow which reloads the driver and triggers fw 243reset if needed. 244 245User commands examples: 246 247- Run fw recover flow manually:: 248 249 $ devlink health recover pci/0000:82:00.0 reporter fw_fatal 250 251- Read FW CR-space dump if already strored or trigger new one:: 252 253 $ devlink health dump show pci/0000:82:00.1 reporter fw_fatal 254 255NOTE: This command can run only on PF. 256 257mlx5 tracepoints 258================ 259 260mlx5 driver provides internal trace points for tracking and debugging using 261kernel tracepoints interfaces (refer to Documentation/trace/ftrase.rst). 262 263For the list of support mlx5 events check /sys/kernel/debug/tracing/events/mlx5/ 264 265tc and eswitch offloads tracepoints: 266 267- mlx5e_configure_flower: trace flower filter actions and cookies offloaded to mlx5:: 268 269 $ echo mlx5:mlx5e_configure_flower >> /sys/kernel/debug/tracing/set_event 270 $ cat /sys/kernel/debug/tracing/trace 271 ... 272 tc-6535 [019] ...1 2672.404466: mlx5e_configure_flower: cookie=0000000067874a55 actions= REDIRECT 273 274- mlx5e_delete_flower: trace flower filter actions and cookies deleted from mlx5:: 275 276 $ echo mlx5:mlx5e_delete_flower >> /sys/kernel/debug/tracing/set_event 277 $ cat /sys/kernel/debug/tracing/trace 278 ... 279 tc-6569 [010] .N.1 2686.379075: mlx5e_delete_flower: cookie=0000000067874a55 actions= NULL 280 281- mlx5e_stats_flower: trace flower stats request:: 282 283 $ echo mlx5:mlx5e_stats_flower >> /sys/kernel/debug/tracing/set_event 284 $ cat /sys/kernel/debug/tracing/trace 285 ... 286 tc-6546 [010] ...1 2679.704889: mlx5e_stats_flower: cookie=0000000060eb3d6a bytes=0 packets=0 lastused=4295560217 287 288- mlx5e_tc_update_neigh_used_value: trace tunnel rule neigh update value offloaded to mlx5:: 289 290 $ echo mlx5:mlx5e_tc_update_neigh_used_value >> /sys/kernel/debug/tracing/set_event 291 $ cat /sys/kernel/debug/tracing/trace 292 ... 293 kworker/u48:4-8806 [009] ...1 55117.882428: mlx5e_tc_update_neigh_used_value: netdev: ens1f0 IPv4: 1.1.1.10 IPv6: ::ffff:1.1.1.10 neigh_used=1 294 295- mlx5e_rep_neigh_update: trace neigh update tasks scheduled due to neigh state change events:: 296 297 $ echo mlx5:mlx5e_rep_neigh_update >> /sys/kernel/debug/tracing/set_event 298 $ cat /sys/kernel/debug/tracing/trace 299 ... 300 kworker/u48:7-2221 [009] ...1 1475.387435: mlx5e_rep_neigh_update: netdev: ens1f0 MAC: 24:8a:07:9a:17:9a IPv4: 1.1.1.10 IPv6: ::ffff:1.1.1.10 neigh_connected=1 301