1Netdev features mess and how to get out from it alive 2===================================================== 3 4Author: 5 Michał Mirosław <mirq-linux@rere.qmqm.pl> 6 7 8 9 Part I: Feature sets 10====================== 11 12Long gone are the days when a network card would just take and give packets 13verbatim. Today's devices add multiple features and bugs (read: offloads) 14that relieve an OS of various tasks like generating and checking checksums, 15splitting packets, classifying them. Those capabilities and their state 16are commonly referred to as netdev features in Linux kernel world. 17 18There are currently three sets of features relevant to the driver, and 19one used internally by network core: 20 21 1. netdev->hw_features set contains features whose state may possibly 22 be changed (enabled or disabled) for a particular device by user's 23 request. This set should be initialized in ndo_init callback and not 24 changed later. 25 26 2. netdev->features set contains features which are currently enabled 27 for a device. This should be changed only by network core or in 28 error paths of ndo_set_features callback. 29 30 3. netdev->vlan_features set contains features whose state is inherited 31 by child VLAN devices (limits netdev->features set). This is currently 32 used for all VLAN devices whether tags are stripped or inserted in 33 hardware or software. 34 35 4. netdev->wanted_features set contains feature set requested by user. 36 This set is filtered by ndo_fix_features callback whenever it or 37 some device-specific conditions change. This set is internal to 38 networking core and should not be referenced in drivers. 39 40 41 42 Part II: Controlling enabled features 43======================================= 44 45When current feature set (netdev->features) is to be changed, new set 46is calculated and filtered by calling ndo_fix_features callback 47and netdev_fix_features(). If the resulting set differs from current 48set, it is passed to ndo_set_features callback and (if the callback 49returns success) replaces value stored in netdev->features. 50NETDEV_FEAT_CHANGE notification is issued after that whenever current 51set might have changed. 52 53The following events trigger recalculation: 54 1. device's registration, after ndo_init returned success 55 2. user requested changes in features state 56 3. netdev_update_features() is called 57 58ndo_*_features callbacks are called with rtnl_lock held. Missing callbacks 59are treated as always returning success. 60 61A driver that wants to trigger recalculation must do so by calling 62netdev_update_features() while holding rtnl_lock. This should not be done 63from ndo_*_features callbacks. netdev->features should not be modified by 64driver except by means of ndo_fix_features callback. 65 66 67 68 Part III: Implementation hints 69================================ 70 71 * ndo_fix_features: 72 73All dependencies between features should be resolved here. The resulting 74set can be reduced further by networking core imposed limitations (as coded 75in netdev_fix_features()). For this reason it is safer to disable a feature 76when its dependencies are not met instead of forcing the dependency on. 77 78This callback should not modify hardware nor driver state (should be 79stateless). It can be called multiple times between successive 80ndo_set_features calls. 81 82Callback must not alter features contained in NETIF_F_SOFT_FEATURES or 83NETIF_F_NEVER_CHANGE sets. The exception is NETIF_F_VLAN_CHALLENGED but 84care must be taken as the change won't affect already configured VLANs. 85 86 * ndo_set_features: 87 88Hardware should be reconfigured to match passed feature set. The set 89should not be altered unless some error condition happens that can't 90be reliably detected in ndo_fix_features. In this case, the callback 91should update netdev->features to match resulting hardware state. 92Errors returned are not (and cannot be) propagated anywhere except dmesg. 93(Note: successful return is zero, >0 means silent error.) 94 95 96 97 Part IV: Features 98=================== 99 100For current list of features, see include/linux/netdev_features.h. 101This section describes semantics of some of them. 102 103 * Transmit checksumming 104 105For complete description, see comments near the top of include/linux/skbuff.h. 106 107Note: NETIF_F_HW_CSUM is a superset of NETIF_F_IP_CSUM + NETIF_F_IPV6_CSUM. 108It means that device can fill TCP/UDP-like checksum anywhere in the packets 109whatever headers there might be. 110 111 * Transmit TCP segmentation offload 112 113NETIF_F_TSO_ECN means that hardware can properly split packets with CWR bit 114set, be it TCPv4 (when NETIF_F_TSO is enabled) or TCPv6 (NETIF_F_TSO6). 115 116 * Transmit UDP segmentation offload 117 118NETIF_F_GSO_UDP_L4 accepts a single UDP header with a payload that exceeds 119gso_size. On segmentation, it segments the payload on gso_size boundaries and 120replicates the network and UDP headers (fixing up the last one if less than 121gso_size). 122 123 * Transmit DMA from high memory 124 125On platforms where this is relevant, NETIF_F_HIGHDMA signals that 126ndo_start_xmit can handle skbs with frags in high memory. 127 128 * Transmit scatter-gather 129 130Those features say that ndo_start_xmit can handle fragmented skbs: 131NETIF_F_SG --- paged skbs (skb_shinfo()->frags), NETIF_F_FRAGLIST --- 132chained skbs (skb->next/prev list). 133 134 * Software features 135 136Features contained in NETIF_F_SOFT_FEATURES are features of networking 137stack. Driver should not change behaviour based on them. 138 139 * LLTX driver (deprecated for hardware drivers) 140 141NETIF_F_LLTX is meant to be used by drivers that don't need locking at all, 142e.g. software tunnels. 143 144This is also used in a few legacy drivers that implement their 145own locking, don't use it for new (hardware) drivers. 146 147 * netns-local device 148 149NETIF_F_NETNS_LOCAL is set for devices that are not allowed to move between 150network namespaces (e.g. loopback). 151 152Don't use it in drivers. 153 154 * VLAN challenged 155 156NETIF_F_VLAN_CHALLENGED should be set for devices which can't cope with VLAN 157headers. Some drivers set this because the cards can't handle the bigger MTU. 158[FIXME: Those cases could be fixed in VLAN code by allowing only reduced-MTU 159VLANs. This may be not useful, though.] 160 161* rx-fcs 162 163This requests that the NIC append the Ethernet Frame Checksum (FCS) 164to the end of the skb data. This allows sniffers and other tools to 165read the CRC recorded by the NIC on receipt of the packet. 166 167* rx-all 168 169This requests that the NIC receive all possible frames, including errored 170frames (such as bad FCS, etc). This can be helpful when sniffing a link with 171bad packets on it. Some NICs may receive more packets if also put into normal 172PROMISC mode. 173 174* rx-gro-hw 175 176This requests that the NIC enables Hardware GRO (generic receive offload). 177Hardware GRO is basically the exact reverse of TSO, and is generally 178stricter than Hardware LRO. A packet stream merged by Hardware GRO must 179be re-segmentable by GSO or TSO back to the exact original packet stream. 180Hardware GRO is dependent on RXCSUM since every packet successfully merged 181by hardware must also have the checksum verified by hardware. 182