TIME_WAIT state
TIME_WAIT state is the most complicated status among the state transmit of TCP protocal, at first glance its existence is not nessesary, some of the optimization technique eliminate the TIME_WAIT state arbitrarily. But as a part of a protocal which has been applied for so many years, there must be some reason. At least , before eliminating it, we should know the details about it, just as Richard Stevens referred in his book, “Instead of trying to avoid the state, we should understand it”. So what is a TIME_WAIT state TCP endpoint waiting for? Why the endpoint transimit to this state? What’s the problems it brought us? Any way that we can keep away from these problems?
How long does TIME_WAIT state last?
According to TCP specification, once a endpoint is in the TIME_WAIT state, it recommend that the endpoint should stay in this state for 2MSL to make sure the remote endpoint receiving the last FIN segment as much as possible. Once a endpoint is in TIME_WAIT state, the endpoints defining that connection cannot be reused.
2MSL is [Maximum Segment Lifetime] × 2, most of the Linux systems define the MSL 30s. 2MSL is 1 minutes. As we known, The longest life time for a IP packets stay alive in the network is marked by TTL, and TTL is stand for the maximum hops, so there’s no close relationship between the MSL and TTL. In most version of the Linux kernel, MSL is hard coded, and the setting is 1 minutes. But there’s also some other operation system that provided the configuration interface for this value.(tcp.h)
Transmit to TIME_WAIT state
According to the state transfermation of TCP protocal, the endpoint who initiate the FIN will enter the TIME_WAIT state, which means that no matter it’s a client or server, who sent FIN, who TIME_WAIT. But on the other hand, client and server play a totally different role during the communication process. In another word, whether the client or the server transmitting to TIME_WAIT first, will lead to different consequences for the communication. Here are some details. In case 1
3, it’s the client who initiates the FIN. In case 4
5, it’s the server who asks for disconnect first.
Client initiates the FIN request first
Case 1 Client initiates the FIN segment
First, I start the server process, run the client to connect to the server and do the application request for twice. Actually, the client sends two FIN during this operations. After this, I run the netstat to observe the printouts.
Printout on server
Printout on client
The client process initiates FIN request for twice. Notice that, at the second time when the client connects to the server, it goes well, no rejection from the server. Because on the client the tcp port 55725 is time wait, the client os kernel assigns a brand new ephemeral port 55726 for the second connection. Most of the time, when the client raises a tcp connection request, a brand new client side port is assigned to the process. The ephemeral ports range is configurable. In Linux we can check it as this:
This case indacates that if a client initiates a FIN to finish the tcp connection. It will not take any effect to the server side, also not take any effect to itself. But during the design activity of a communication software system, if plenty of tcp connections are required at a very short period, and disconnect at a very short time, the client sides ports will mostly be exhausted. This situation should be considered and avoided. The alternative ways like tcp long term connection, tcp connection pool, or RST to disconnect should be used.
Case 2 Client side binding to a local port init the FIN
We bind a client to a specific local port,
Modify the code, recompile and run the client twice.
It indicates that once a tcp connection in TIME_WAIT state, it cannot be recreated.
Case 3 Client side binding to a local port init a FIN to a server, then this client init another SYN request to a brand new server process
Run two server processes seperately on the host 10.22.5.3 and host 10.16.56.2. On the host 10.16.56.2, run client twice, we expect to establish two tcp connections, one is <10.16.56.2,55555,10.22.5.3,1982>, the other is <10.16.56.2,55555,10.16.56.2,1982>, we assume the two connections will be established successfully. But actually not on Ubuntu linux.
Obviously, they are two different tcp connections. Ubuntu linux forbid the usage of port “55555” to start a connection. This is not reasonable. We can only wait for the time out, and init the second connection.
Server init the FIN request
Case 4 Server init the FIN, then restart
TIME_WAIT brings kinds of problem to the server, and it will have a much greater influence to the communication than TIME_WAIT on the client’s side. As a communication system engineer, the TIME_WAIT state on the server should bring our attention with a higher priority.
Start the server process, connect to it with two different clients. The two connections established successfully. Kill the server.
Try to restart the server, failed.
If the server init the FIN, the tcp endpoint on the server side enters the TIME_WAIT state. If a server is serving a huge amount of clients, all of the connections’ state will transmit to TIME_WAIT at that moment.
Case 5 Server init the FIN, client which binding to a specific port init the SYN for twice
Bind the client tcp port to 55555, connect the server twice.
Observe the captured packets, notice the 8th packet marked black by wireshark, TCP port numbers resued,it tells that the TIME_WAIT tcp port on the server side doesn’t resist the connection request from the client. The TIME_WAIT tcp port can be reused at this situation.
Why the TIME_WAIT required?
Why setting the 2MSL? We assume there’s no TIME_WAIT, what will happen?
Senario 1
Suppose it’s alowed to create two identical (4-tuple) tcp connection at the same time. The 2nd connection is an incarnation of the first one. If there are packets delayed during the first connection, but still alive until the incarnation connection is created. (Because the waiting time is not long enough to make sure the network discard the delayed packets.), this will bring some unknown errors into the network.
Although it’s a event of small probability, there’s still possibilities. The protocal itself has already get some preventive measures to keep this situation from happenning. First, during 3 way handshakes, ISN is one of the measures, second, the client tcp port is assigned by the os kernel most of the time with an ephemeral port which ensures that a new connection to the same host with a different 4 tuple id.
Senario 2
Suppose a tcp disconnection procedure is in processing. The client sends a FIN, receive a ACK. But the next FIN from the server or the last ACK sent to server is lost in the network. What will happen next if the client doesn’t wait for 2MSL? The server resends the FIN, and the client thinks that the communication is over, and answer a RST to the last FIN, the server will get a RST and think “Shit, this is not a successful communication”.
Dealing with the problems TIME_WAIT brings
Before optimization activities against TIME_WAIT, you should think it over, and consider every relevant details to ensure not bringing more additional problems.
Stradegy 1 Change the TIME_WAIT time setting
Refer to OS Manuals, there’s no setting in Ubuntu at the time this one is writing.
Stradegy 2 Socket parameter SO_REUSEADDR
After call the socket function, set the SO_REUSEADDR, the process will discard the TIME_WAIT state.
Stradegy 3 Ensure the client send the first FIN
Who send FIN first, who transmit to TIME_WAIT, there are extra resource hanging during this state. Comparing with clients, the resources on the server is much more expensive and valuable.
Stradegy 4 Disconnection with RST
No matter who sends the RST segment to disconnect, no one will transmit to TIME_WAIT, the TCP data structure will release at once. And we write some addtional code on the application layer to ensure that the communication is successful and effective.
3 необычных кейса о сетевой подсистеме Linux
В этой статье представлены три небольшие истории, которые произошли в нашей практике: в разное время и в разных проектах. Объединяет их то, что они связаны с сетевой подсистемой Linux (Reverse Path Filter, TIME_WAIT, multicast) и иллюстрируют, как глубоко зачастую приходится анализировать инцидент, с которым сталкиваешься впервые, чтобы решить возникшую проблему… и, конечно, какую радость можно испытать в результате полученного решения.
История первая: о Reverse Path Filter
Клиент с большой корпоративной сетью решил пропускать часть своего интернет-трафика через единый корпоративный файрвол, расположенный за маршрутизатором центрального подразделения. С помощью iproute2 трафик, уходящий в интернет, был направлен в центральное подразделение, где уже было настроено несколько таблиц маршрутизации. Добавив дополнительную таблицу маршрутизации и настроив в ней маршруты перенаправления на файрвол, мы включили перенаправление трафика из других филиалов и… трафик не пошел.
Схема прохождения трафика через таблицы и цепочки Netfilter
Начали выяснять, почему не работает настроенная маршрутизация. На входящем туннельном интерфейсе маршрутизатора трафик обнаруживался:
Однако на исходящем интерфейсе пакетов не было. Стало ясно, что фильтруются они на маршрутизаторе, однако явно установленных правил отбрасывания пакетов в iptables не было. Поэтому мы начали последовательно, по мере прохождения трафика, устанавливать правила, отбрасывающие наши пакеты и после установки смотреть счетчики:
Проверили последовательно nat PREROUTING, mangle PREROUTING. В mangle FORWARD счетчик не увеличивался, а значит — пакеты теряются на этапе маршрутизации. Проверив снова маршруты и правила, начали изучать, что именно происходит на этом этапе.
В ядре Linux для каждого интерфейса по умолчанию включен параметр Reverse Path Filtering ( rp_filter ). В случае, когда вы используете сложную, асимметричную маршрутизацию и пакет с ответом будет возвращаться в источник не тем маршрутом, которым пришел пакет-запрос, Linux будет отфильтровывать такой трафик. Для решения этой задачи необходимо отключить Reverse Path Filtering для всех ваших сетевых устройств, принимающих участие в маршрутизации. Чуть ниже простой и быстрый способ сделать это для всех имеющихся у вас сетевых устройств:
Возвращаясь к кейсу, мы решили проблему, отключив Reverse Path Filter для интерфейса tap0 и теперь хорошим тоном на маршрутизаторах считаем отключение rp_filter для всех устройств, принимающих участие в асимметричном роутинге.
История вторая: о TIME_WAIT
В обслуживаемом нами высоконагруженном веб-проекте возникла необычная проблема: от 1 до 3 процентов пользователей не могли получить доступ к сайту. При изучении проблемы мы выяснили, что недоступность никак не коррелировала с загрузкой любых системных ресурсов (диск, память, сеть и т.д.), не зависела от местоположения пользователя или его оператора связи. Единственное, что объединяло всех пользователей, которые испытывали проблемы, — они выходили в интернет через NAT.
Состояние TIME_WAIT в протоколе TCP позволяет системе убедиться в том, что в данном TCP-соединении действительно прекращена передача данных и никакие данные не были потеряны. Но возможное количество одновременно открытых сокетов — величина конечная, а значит — это ресурс, который тратится в том числе и на состояние TIME_WAIT , в котором не выполняется обслуживание клиента.
Механизм закрытия TCP-соединения
Разгадка, как и ожидалось, нашлась в документации ядра. Естественное желание администратора highload-системы — уменьшить «холостое» потребление ресурсов. Беглое гугление покажет нам множество советов, которые призывают включить опции ядра Linux tcp_tw_reuse и tcp_tw_recycle . Но с tcp_tw_recycle не всё так просто, как могло показаться.
-
Параметр tcp_tw_reuse полезно включить в борьбе за ресурсы, занимаемые TIME_WAIT . TCP-соединение идентифицируется по набору параметров IP1_Port1_IP2_Port2 . Когда сокет переходит в состояние TIME_WAIT , при отключенном tcp_tw_reuse установка нового исходящего соединения будет происходить с выбором нового локального IP1_Port1 . Старые значения могут быть использованы только тогда, когда TCP-соединение окажется в состоянии CLOSED . Если ваш сервер создает множество исходящих соединений, установите tcp_tw_reuse = 1 и ваша система сможет использовать порты TIME_WAIT в случае исчерпания свободных. Для установки впишите в /etc/sysctl.conf :
И выполните команду:
История третья: об OSPF и мультикастовом трафике
Обслуживаемая корпоративная сеть была построена на базе tinc VPN и прилегающими к ней лучами IPSec и OVPN-соединений. Для маршрутизации всего этого адресного пространства L3 мы использовали OSPF. На одном из узлов, куда агрегировалось большое количество каналов, мы обнаружили, что небольшая часть сетей, несмотря на верную конфигурацию OSPF, периодически пропадает из таблицы маршрутов на этом узле.
Упрощенное устройство VPN-сети, используемой в описываемом проекте
В первую очередь проверили связь с маршрутизаторами проблемных сетей. Связь была стабильной:
Продиагностировав OSPF, мы удивились еще больше. На узле, где наблюдались проблемы, маршрутизаторы проблемных сетей отсутствовали в списке соседей. На другой стороне проблемный маршрутизатор в списке соседей присутствовал:
Следующим этапом исключили возможные проблемы с доставкой ospf hello от 172.24.0.1. Запросы от него приходили, а вот ответы — не уходили:
Никаких ограничений в iptables не было установлено — выяснили, что пакет отбрасывается уже после прохождения всех таблиц в Netfilter. Снова углубились в чтение документации, где и был обнаружен параметр ядра igmp_max_memberships , который ограничивает количество multicast-соединений для одного сокета. По умолчанию это количество равно 20. Мы, для круглого числа, увеличили его до 42 — работа OSPF нормализовалась:
Заключение
Какой бы сложной ни была проблема, она всегда решаема и зачастую — с помощью изучения документации. Буду рад увидеть в комментариях описание вашего опыта поиска решения сложных и необычных проблем.
What is the purpose of TIME WAIT in TCP connection tear down?
I found that the reason the active closer enters TIME WAIT is to make sure that the final ACK is not lost. But how does it know if the final ACK is lost? Will the passive closer resend the FIN and then the active closer knows the ACK was lost? Here is a picture of the TCP FSM.
3 Answers 3
Will the passive closer resend the FIN and then the active closer knows the ACK was lost?
Yes. Quoting from TCP/IP Illustrated Volume 1, in the TCP Connection Management section:
- To complete the close, the final segment contains an ACK for the last FIN. Note that if a FIN is lost, it is retransmitted until an ACK for it is received.
There is a timeout. When in LAST_ACK , the passive closer will resend FIN when there is a timeout, assuming that it was lost. If it was indeed lost, then the active closer will eventually receive the retransmitted FIN and enter TIME_WAIT . If the FIN was not lost but the final ACK was lost, then the active closer is in TIME_WAIT and receives FIN again. When this happens — receiving a FIN in TIME_WAIT — the ACK is retransmitted.
The timeout value in TIME_WAIT is NOT used for retransmission purposes. When there is a timeout in TIME_WAIT , it is assumed that the final ACK was successfully delivered because the passive closer didn’t retransmit FIN packets. So, the timeout in TIME_WAIT is just an amount of time after which we can safely assume that if the other end didn’t send anything, then it’s because he received the final ACK and closed the connection.
Exploring Time_Wait status in Linux Netstat command
TIME WAIT state is a normal part of a TCP socket’s life cycle. It cannot and should not be avoided. TIME WAIT sockets may become an issue when there are tens of thousands active at any given time. Otherwise, smaller numbers of TIME WAIT sockets are normal.
TIME_WAIT: The host waits for a reasonable amount of time to ensure the remote host receives the final acknowledgment of a session termination request.
Netstat is a handy command to check the network connections in Linux system. We can use netstat command to check which connection is in the time_wait state.
Today we will dive into this.
- What is the impact of time_wait Tcp connections?
- How to reduce the time_wait timer in Linux?
- Example of time_wait in Linux
What is time_wait state?
Time_wait is a state in the TCP connection process. A socket will be in the TIME-WAIT state after it has received an Fin from the remote side.
After that point, the socket will automatically close itself. This ensures that old connections are properly shut down and prevents any errors or data corruption during network communication.
- The TIME WAIT state is part of the TCP protocol connection close, as described in RFC 9293 – Transmission Control Protocol, Section 3.6 Closing a Connection.
- The TIME WAIT state is entered by the Active Closer (the party who sends the first FIN) after they have received an ACK and a FIN from the Passive Closer, and sent an ACK to the Passive Closer’s last FIN.
- The RFC defines the time spent in TIME WAIT state as “2 times MSL (Maximum Segment Lifetime)” The Linux kernel’s implementation of TCP is hard-coded however with a TIME WAIT counter of 60 seconds.
When all outstanding packets have been successfully delivered, the socket exits Time_Wait state and can be reused.
This prevents connections from being re-opened before all pending packets have been processed by the network.
Concerns about time_wait state
The Time_Wait state is necessary for the proper functioning of TCP and other networking protocols, but can cause some issues for applications that require frequent connection establishment.
For example, a web server might run out of available sockets due to too many connections being in Time_Wait.
To address this issue, some systems use a technique called “Time_Wait recycling” which allows sockets to be reused after a certain amount of time has passed.
This is usually done with an automated process that periodically recycles sockets in Time_Wait state, allowing for more efficient use of available network resources.
Time_wait could happen on the client-side or server-side. It depends on which side terminates the tcp session. From the above chart, A is the active closer and B is the passive closer.
When A closes the connection, it will send a FIN packet to B. After A gets the Ack and FIN back from B, tcp connection will change to time_wait on A-side. Time_wait happens on the active closer side.
What is the impact of time_wait Tcp connections?
Time_wait state is a normal part of a TCP socket’s life cycle. Smaller numbers of TIME WAIT sockets are normal. If there are a lot of time_wait sockets, it will need some time to exit.
If our application needs to create new sockets at this time, it will fail because we don’t have enough ports now.
How to reduce the time_wait timer in Linux?
The RFC defines the time spent in TIME WAIT state as “2 times MSL (Maximum Segment Lifetime)”. But the Linux kernel’s implementation of TCP is hard-coded with a TIME WAIT counter of 60 seconds.
So there is no way to reduce this timer. But in some operating systems, we can reuse these ports by configuring some kernel parameters.
Understanding netstat command
Netstat is a command-line tool used in networking to display network connections and statistics. It can be used to show active network connections, open ports, and other information related to network activity.
The command works on various operating systems including Windows, Linux, and macOS.
Some of the most common uses of netstat include:
- Displaying all active TCP/IP connections: netstat -a
- Displaying only listening server sockets: netstat -l
- Showing the status of all current network interfaces: netstat -i
- Displaying statistics for each protocol (TCP, UDP): netstat -s
The output generated by the netstat command can be quite detailed and may require some interpretation. For example, it shows the local address and port number being used by a program as well as the remote address and port number of the destination it’s communicating with.
Example of time_wait in netstat command
This is a normal tcp connection on our Cassandra server. We can use netstat -anpl to check the connection status in Linux.
tcp 0 115 10.253.113.116:37640 10.241.94.101:7000 ESTABLISHED 31945/java
Now let’s shutdown Cassandra on the server-side, we can see that the TCP connection became Time_wait.
tcp 0 0 10.253.113.116:37640 10.241.94.101:7000 TIME_WAIT —
If we see time_wait connections, that means something wrong with the application. It terminates the connections. We should check what happens from the application side.
We can use this command to check the time_wait timer on Linux.
# ss —numeric -o state time-wait
Conclusion
In general, the Time_Wait state is important for maintaining network reliability, but can be a source of issues if there are too many connections in this state.
To prevent these issues and ensure efficient use of resources, it is best to monitor the number of sockets in Time_Wait and take action when needed.
This could involve using the Time_Wait recycling technique or increasing the number of available sockets in order to reduce the amount of time spent in this state.