AF_XDP TX Metadata

This document describes how to enable offloads when transmitting packets via AF_XDP. Refer to XDP RX Metadata on how to access similar metadata on the receive side.

General Design

The headroom for the metadata is reserved via tx_metadata_len and XDP_UMEM_TX_METADATA_LEN flag in struct xdp_umem_reg. The metadata length is therefore the same for every socket that shares the same umem. The metadata layout is a fixed UAPI, refer to union xsk_tx_metadata in include/uapi/linux/if_xdp.h. Thus, generally, the tx_metadata_len field above should contain sizeof(union xsk_tx_metadata).

Note that in the original implementation the XDP_UMEM_TX_METADATA_LEN flag was not required. Applications might attempt to create a umem with a flag first and if it fails, do another attempt without a flag.

The headroom and the metadata itself should be located right before xdp_desc->addr in the umem frame. Within a frame, the metadata layout is as follows:

       tx_metadata_len
 /                         \
+-----------------+---------+----------------------------+
| xsk_tx_metadata | padding |          payload           |
+-----------------+---------+----------------------------+
                            ^
                            |
                      xdp_desc->addr

An AF_XDP application can request headrooms larger than sizeof(struct xsk_tx_metadata). The kernel will ignore the padding (and will still use xdp_desc->addr - tx_metadata_len to locate the xsk_tx_metadata). For the frames that shouldn’t carry any metadata (i.e., the ones that don’t have XDP_TX_METADATA option), the metadata area is ignored by the kernel as well.

The flags field enables the particular offload:

  • XDP_TXMD_FLAGS_TIMESTAMP: requests the device to put transmission timestamp into tx_timestamp field of union xsk_tx_metadata.

  • XDP_TXMD_FLAGS_CHECKSUM: requests the device to calculate L4 checksum. csum_start specifies byte offset of where the checksumming should start and csum_offset specifies byte offset where the device should store the computed checksum.

Besides the flags above, in order to trigger the offloads, the first packet’s struct xdp_desc descriptor should set XDP_TX_METADATA bit in the options field. Also note that in a multi-buffer packet only the first chunk should carry the metadata.

Software TX Checksum

For development and testing purposes its possible to pass XDP_UMEM_TX_SW_CSUM flag to XDP_UMEM_REG UMEM registration call. In this case, when running in XDK_COPY mode, the TX checksum is calculated on the CPU. Do not enable this option in production because it will negatively affect performance.

Querying Device Capabilities

Every devices exports its offloads capabilities via netlink netdev family. Refer to xsk-flags features bitmask in Documentation/netlink/specs/netdev.yaml.

  • tx-timestamp: device supports XDP_TXMD_FLAGS_TIMESTAMP

  • tx-checksum: device supports XDP_TXMD_FLAGS_CHECKSUM

See tools/net/ynl/samples/netdev.c on how to query this information.

Example

See tools/testing/selftests/bpf/xdp_hw_metadata.c for an example program that handles TX metadata. Also see https://github.com/fomichev/xskgen for a more bare-bones example.