Skip to main content

TCP Tuning for 10G

(2024-11-12)

This page contains a quick reference guide for Linux tuning for Data Transfer hosts connected at speeds of 1Gbps or higher. Note that most of the tuning settings described here will actually decrease performance of hosts connected at rates less than 1Gbps, such as most home users.

Note that the settings on this page are not attempting to achieve full 10G with a single flow. These settings assume you are using tools that support parallel streams, or have multiple data transfers occurring in parallel, and want to have fair sharing between the flows. As such the maximum values are 2 to 4 times less than what would be required to support a single stream. As an example, a 10Gbps flow across a 100ms network requires 120MB of buffering (see the BDP calculator). Most data movement applications, such as GridFTP, would employ 4-8 streams to do this efficiently and to guard against congestive packet loss. Setting your 10Gbps capable host to consume a maximum of 32M - 64M per socket ensures that parallel streams work well, and do not consume a majority of system resources.

Like most modern OSes, Linux now does a good job of auto-tuning the TCP buffers, but the default maximum Linux TCP buffer sizes are still too small for 10G networks. Here are some example sysctl.conf configurations for different types of hosts.

For a host with a 10G NIC, optimized for network paths up to 100ms RTT, and for friendliness to single and parallel stream tools, add this to /etc/sysctl.conf:

(Note that additional tuning is needed for hosts with 100G NICs)

# allow TCP with buffers up to 64MB 
net.core.rmem_max = 67108864 
net.core.wmem_max = 67108864 
# increase Linux autotuning TCP buffer limit to 32MB
net.ipv4.tcp_rmem = 4096 87380 33554432
net.ipv4.tcp_wmem = 4096 65536 33554432
# recommended for hosts with jumbo frames enabled
net.ipv4.tcp_mtu_probing=1
# recommended to use a 'fair queueing' qdisc (either fq or fq_codel)
net.core.default_qdisc = fq

Note that fq_codel became the default starting with the 4.12 kernel in 2017. Both fq and fq_codel work well, and support pacing, although fq is recommended by the BBR team at Google for use with BBR congestion control.

Also note that we no longer recommend setting congestion control to htcp. With newer versions of the kernel there no longer appears to be an advantage of htcp over the default setting of cubic.

We also strongly recommend reducing the maximum flow rate to avoid bursts of packets that could overflow switch and receive host buffers.

For example for a 10G host, add this to a boot script:

/sbin/tc qdisc add dev ethN root fq maxrate 8gbit 

For for a 10G host running data transfer tools that use 4 parallel streams, do this:

/sbin/tc qdisc add dev ethN root fq maxrate 2gbit 

Where 'ethN' is the name of the ethernet device on your system.

For a host with a 10G NIC optimized for network paths up to 200ms RTT, and for friendliness to single and parallel stream tools, or a 40G NIC up on paths up to 50ms RTT:

# allow TCP with buffers up to 128MB
net.core.rmem_max = 134217728 
net.core.wmem_max = 134217728 
# increase TCP autotuning buffer limits.
net.ipv4.tcp_rmem = 4096 87380 67108864
net.ipv4.tcp_wmem = 4096 65536 67108864
# recommended for hosts with jumbo frames enabled
net.ipv4.tcp_mtu_probing=1
# recommended to enable 'fair queueing'
net.core.default_qdisc = fq

For a host with a 100G NIC optimized for network paths up to 200ms RTT, use this config:

# allow TCP with buffers up to 2GB (Max allowed in Linux is 2GB-1)
net.core.rmem_max=2147483647
net.core.wmem_max=2147483647
# increase TCP autotuning buffer limits. 
net.ipv4.tcp_rmem=4096 131072 1073741824 
net.ipv4.tcp_wmem=4096 16384 1073741824
# recommended for hosts with jumbo frames enabled
net.ipv4.tcp_mtu_probing=1
# recommended to enable 'fair queueing'
net.core.default_qdisc = fq
# need to increase this to use MSG_ZEROCOPY
net.core.optmem_max = 1048576 

Notes: you should leave net.tcp_mem alone, as the defaults are fine. A number of performance experts say to also increase net.core.optmem_max to match net.core.rmem_max and net.core.wmem_max, but we have not found that makes any difference. Some sites also say to set net.ipv4.tcp_timestamps and net.ipv4.tcp_sack to 0, as doing that reduces CPU load. We strongly disagree with that recommendation for WAN performance, as we have observed that the default value of 1 helps in more cases than it hurts.

Linux supports pluggable congestion control algorithms. To get a list of congestion control algorithms that are available in your kernel, run:

sysctl net.ipv4.tcp_available_congestion_control

cubic is usually the default in most linux distributions. You might also want to try BBR if its available on your system.

To see the default congestion control algorithm, do:

sysctl net.ipv4.tcp_congestion_control 

If you are using Jumbo Frames, we recommend setting tcp_mtu_probing = 1 to help avoid the problem of MTU black holes. Setting it to 2 sometimes causes performance problems.

More Information:

Linux Network Performance Parameters explained

(Source for this whole page)