1Segmentation Offloads in the Linux Networking Stack 2 3Introduction 4============ 5 6This document describes a set of techniques in the Linux networking stack 7to take advantage of segmentation offload capabilities of various NICs. 8 9The following technologies are described: 10 * TCP Segmentation Offload - TSO 11 * UDP Fragmentation Offload - UFO 12 * IPIP, SIT, GRE, and UDP Tunnel Offloads 13 * Generic Segmentation Offload - GSO 14 * Generic Receive Offload - GRO 15 * Partial Generic Segmentation Offload - GSO_PARTIAL 16 * SCTP accelleration with GSO - GSO_BY_FRAGS 17 18TCP Segmentation Offload 19======================== 20 21TCP segmentation allows a device to segment a single frame into multiple 22frames with a data payload size specified in skb_shinfo()->gso_size. 23When TCP segmentation requested the bit for either SKB_GSO_TCPV4 or 24SKB_GSO_TCPV6 should be set in skb_shinfo()->gso_type and 25skb_shinfo()->gso_size should be set to a non-zero value. 26 27TCP segmentation is dependent on support for the use of partial checksum 28offload. For this reason TSO is normally disabled if the Tx checksum 29offload for a given device is disabled. 30 31In order to support TCP segmentation offload it is necessary to populate 32the network and transport header offsets of the skbuff so that the device 33drivers will be able determine the offsets of the IP or IPv6 header and the 34TCP header. In addition as CHECKSUM_PARTIAL is required csum_start should 35also point to the TCP header of the packet. 36 37For IPv4 segmentation we support one of two types in terms of the IP ID. 38The default behavior is to increment the IP ID with every segment. If the 39GSO type SKB_GSO_TCP_FIXEDID is specified then we will not increment the IP 40ID and all segments will use the same IP ID. If a device has 41NETIF_F_TSO_MANGLEID set then the IP ID can be ignored when performing TSO 42and we will either increment the IP ID for all frames, or leave it at a 43static value based on driver preference. 44 45UDP Fragmentation Offload 46========================= 47 48UDP fragmentation offload allows a device to fragment an oversized UDP 49datagram into multiple IPv4 fragments. Many of the requirements for UDP 50fragmentation offload are the same as TSO. However the IPv4 ID for 51fragments should not increment as a single IPv4 datagram is fragmented. 52 53UFO is deprecated: modern kernels will no longer generate UFO skbs, but can 54still receive them from tuntap and similar devices. Offload of UDP-based 55tunnel protocols is still supported. 56 57IPIP, SIT, GRE, UDP Tunnel, and Remote Checksum Offloads 58======================================================== 59 60In addition to the offloads described above it is possible for a frame to 61contain additional headers such as an outer tunnel. In order to account 62for such instances an additional set of segmentation offload types were 63introduced including SKB_GSO_IPXIP4, SKB_GSO_IPXIP6, SKB_GSO_GRE, and 64SKB_GSO_UDP_TUNNEL. These extra segmentation types are used to identify 65cases where there are more than just 1 set of headers. For example in the 66case of IPIP and SIT we should have the network and transport headers moved 67from the standard list of headers to "inner" header offsets. 68 69Currently only two levels of headers are supported. The convention is to 70refer to the tunnel headers as the outer headers, while the encapsulated 71data is normally referred to as the inner headers. Below is the list of 72calls to access the given headers: 73 74IPIP/SIT Tunnel: 75 Outer Inner 76MAC skb_mac_header 77Network skb_network_header skb_inner_network_header 78Transport skb_transport_header 79 80UDP/GRE Tunnel: 81 Outer Inner 82MAC skb_mac_header skb_inner_mac_header 83Network skb_network_header skb_inner_network_header 84Transport skb_transport_header skb_inner_transport_header 85 86In addition to the above tunnel types there are also SKB_GSO_GRE_CSUM and 87SKB_GSO_UDP_TUNNEL_CSUM. These two additional tunnel types reflect the 88fact that the outer header also requests to have a non-zero checksum 89included in the outer header. 90 91Finally there is SKB_GSO_TUNNEL_REMCSUM which indicates that a given tunnel 92header has requested a remote checksum offload. In this case the inner 93headers will be left with a partial checksum and only the outer header 94checksum will be computed. 95 96Generic Segmentation Offload 97============================ 98 99Generic segmentation offload is a pure software offload that is meant to 100deal with cases where device drivers cannot perform the offloads described 101above. What occurs in GSO is that a given skbuff will have its data broken 102out over multiple skbuffs that have been resized to match the MSS provided 103via skb_shinfo()->gso_size. 104 105Before enabling any hardware segmentation offload a corresponding software 106offload is required in GSO. Otherwise it becomes possible for a frame to 107be re-routed between devices and end up being unable to be transmitted. 108 109Generic Receive Offload 110======================= 111 112Generic receive offload is the complement to GSO. Ideally any frame 113assembled by GRO should be segmented to create an identical sequence of 114frames using GSO, and any sequence of frames segmented by GSO should be 115able to be reassembled back to the original by GRO. The only exception to 116this is IPv4 ID in the case that the DF bit is set for a given IP header. 117If the value of the IPv4 ID is not sequentially incrementing it will be 118altered so that it is when a frame assembled via GRO is segmented via GSO. 119 120Partial Generic Segmentation Offload 121==================================== 122 123Partial generic segmentation offload is a hybrid between TSO and GSO. What 124it effectively does is take advantage of certain traits of TCP and tunnels 125so that instead of having to rewrite the packet headers for each segment 126only the inner-most transport header and possibly the outer-most network 127header need to be updated. This allows devices that do not support tunnel 128offloads or tunnel offloads with checksum to still make use of segmentation. 129 130With the partial offload what occurs is that all headers excluding the 131inner transport header are updated such that they will contain the correct 132values for if the header was simply duplicated. The one exception to this 133is the outer IPv4 ID field. It is up to the device drivers to guarantee 134that the IPv4 ID field is incremented in the case that a given header does 135not have the DF bit set. 136 137SCTP accelleration with GSO 138=========================== 139 140SCTP - despite the lack of hardware support - can still take advantage of 141GSO to pass one large packet through the network stack, rather than 142multiple small packets. 143 144This requires a different approach to other offloads, as SCTP packets 145cannot be just segmented to (P)MTU. Rather, the chunks must be contained in 146IP segments, padding respected. So unlike regular GSO, SCTP can't just 147generate a big skb, set gso_size to the fragmentation point and deliver it 148to IP layer. 149 150Instead, the SCTP protocol layer builds an skb with the segments correctly 151padded and stored as chained skbs, and skb_segment() splits based on those. 152To signal this, gso_size is set to the special value GSO_BY_FRAGS. 153 154Therefore, any code in the core networking stack must be aware of the 155possibility that gso_size will be GSO_BY_FRAGS and handle that case 156appropriately. 157 158There are some helpers to make this easier: 159 160 - skb_is_gso(skb) && skb_is_gso_sctp(skb) is the best way to see if 161 an skb is an SCTP GSO skb. 162 163 - For size checks, the skb_gso_validate_*_len family of helpers correctly 164 considers GSO_BY_FRAGS. 165 166 - For manipulating packets, skb_increase_gso_size and skb_decrease_gso_size 167 will check for GSO_BY_FRAGS and WARN if asked to manipulate these skbs. 168 169This also affects drivers with the NETIF_F_FRAGLIST & NETIF_F_GSO_SCTP bits 170set. Note also that NETIF_F_GSO_SCTP is included in NETIF_F_GSO_SOFTWARE. 171