Timer interpretation
Related Source Code
The implementation of timers is divided into two parts: the JavaScript layer and the libuv layer. timer_wrap.cc acts as a bridge to facilitate the interaction between JavaScript and C++.
Use Cases
The main use cases or applicable scenarios for timers are:
Scheduled tasks, such as timed status checks in business logic;
Timeout control, such as network timeout control for retransmission.
In the implementation of Node.js, for example in the HTTP module:
function responseOnEnd() {
// omitted
debug('AGENT socket keep-alive');
if (req.timeoutCb) {
socket.setTimeout(0, req.timeoutCb);
req.timeoutCb = null;
}
}You may wonder: why is it used in the HTTP module?
We know that the HTTP protocol uses a "request-response" mode. When using the normal mode, that is, the non-KeepAlive mode, each request/response between the client and server requires a new connection, which is immediately disconnected after completion (HTTP protocol is a connectionless protocol); when using the Keep-Alive mode (also known as persistent connection, connection reuse), the Keep-Alive function keeps the client-to-server connection valid, avoiding the need to establish or re-establish a connection when subsequent requests are made to the server.
In HTTP/1.0, it is turned off by default and needs to be enabled by adding "Connection: Keep-Alive" to the HTTP header; in HTTP/1.1, Keep-Alive is enabled by default, and "Connection: close" is added to turn it off.
Currently, most browsers use the HTTP/1.1 protocol, which means that Keep-Alive connection requests are sent by default. Node.js judges and processes the two protocols according to the above code.
Of course, this connection cannot be kept alive forever, so there is usually a timeout period. If the client has not sent a new HTTP request after this period, the server needs to automatically disconnect in order to continue to serve other clients. The HTTP module of Node.js creates a socket object for each new connection and calls socket.setTimeout to set a timer for automatic disconnection after timeout.
Data Structure Selection
A Timer is essentially a data structure where tasks with closer deadlines have higher priority. It provides the following three basic operations:
schedule: add a task
cancel: delete a task
expire: execute expired tasks
Linked List
O(1)
O(n)
O(n)
Sorted Linked List
O(n)
O(1)
O(1)
Min Heap
O(lgn)
O(1)
O(1)
Time Wheel
O(1)
O(1)
O(1)
The implementation of timers has undergone changes, each of which is a spark of collision of ideas. Let's dive into the source code and savor it carefully.
libuv Implementation
Data Structure - Min Heap
A min heap is a binary heap, which is a complete binary tree or an approximately complete binary tree. It is divided into two types: a maximum heap and a minimum heap. Maximum heap: the key value of the parent node is always greater than or equal to the key value of any child node; minimum heap: the key value of the parent node is always less than or equal to the key value of any child node. The schematic diagram is as follows: 
The node is defined in deps/uv/src/heap-inl.h as follows:
The root node is defined as follows:
Here we can see clearly that the min heap organizes data with pointers, not arrays. min always points to the smallest node if it exists. As a sorted set, it also needs a user-specified comparison function to determine which node is smaller, or when the expiration time is the same, to determine their order. After all, there are no rules without rules.
Here we can see that first, the timeout of the two is compared. If the two are the same, then the id of the two that were schedule is compared. This id is generated by loop->timer_counter in uv_timer_start and assigned to start_id.
Implementation
L68-L69, Check the parameters and return -EINVAL if there is an error.
L71-L72, If there is an active timer, stop it immediately.
L74-L82, Assign parameters, and the
start_idmentioned above is obtained by incrementingtimer_counter.L84-L86, Insert the timer node into the min heap, and the algorithm complexity here is O(lgn).
L87, Mark the handle as inactive and add it to the statistics.
L94, Check the handle. If it is inactive, it means it has not been started, so return success. L97-L99, Remove the timer node from the min heap. L100, Reset the handle and decrement the count.
After understanding how to start and stop a timer, let's see how to schedule a timer.
In the event loop of node.js, after updating the time, uv__run_timers is called immediately, indicating that timers, as an external system dependency module, have the highest priority.
L155-L157, Get the minimum timer node. If it is empty, exit the loop. L159-L161, Get the object's address through the offset of heap_node. If the minimum timeout is greater than the current time, it means that the expiration time has not yet arrived, so exit the loop. L163-L165, Delete the timer. If it is a timer that needs to be executed repeatedly, it is added again by calling uv_timer_again. After executing the timer's callback task, loop again.
Improved hierarchical time wheel implementation
https://github.com/libuv/libuv/pull/823
Bridge Layer
This section requires knowledge of node.js addon. It is assumed that you already have this knowledge.
The Timer addon exports the start and stop methods for use by the JS layer.
Start requires two parameters: 1. timeout; 2. the period of repeated execution. L78 calls uv_timer_start, where OnTimeout is the callback function for the timer. Let's take a look at the implementation of this function:
You may be wondering how handle->data retrieves the object pointer.
Since TimerWrap inherits from HandleWrap, the data private variable of the handle is pointed to the HandleWrap object, which is this pointer. The callback function retrieves the TimerWrap object by casting.
What's interesting is L96, where C++ calls JS. By checking the modification history of this location, I found that:
timers: dispatch ontimeout callback by array index
Achieve a minor speed-up by looking up the timeout callback on the timer object by using an array index rather than a named property.
Gives a performance boost of about 1% on the misc/timers benchmarks.
The previous implementation used property lookup, and through extreme optimization, property lookup was replaced with array indexing, resulting in a 1% performance improvement in the benchmark. The overall performance improvement comes from these incremental improvements.
timers.js
With the bridge layer, JS has the ability to start and stop a timer.
To avoid affecting the event loop in node.js, the timer module provides some internal APIs, such as timers._unrefActive, for objects like sockets.
In the initial design, each time _unrefActive adds a task, it maintains the order of unrefList to ensure that the object with the smallest timeout is at the front. This way, when the timer times out, it can process the timeout task at the fastest speed and set the next timer. However, in the worst case, when adding a task, it needs to traverse all nodes in the unrefList linked list.
L524-L534, only create one unrefTimer to handle timeouts for internal use, processing one and then the next.
L553-L571, when inserting a timer, it is necessary to ensure that unrefList is ordered, requiring traversal of the linked list to find the insertion point, which is O(N) in the worst case.
Obviously, establishing a connection in HTTP is the most frequent operation, so adding nodes to the unrefList linked list is also very frequent, and the initially set timer is actually very rarely timed out, because the normal operation of io will cancel the timer in the middle. So the problem becomes that the most performance-consuming operation is very frequent, while the operation that takes almost no time is rarely executed.
How to solve this problem?
Obviously, this also follows the 80/20 principle. In terms of ideas, we should make 80% of the cases more efficient.
Use an unsorted linked list
The main idea is to move the traversal operation of the unrefList linked list to the unrefTimeout timer timeout processing. This way, finding the timed-out tasks requires more time each time, which is O(n), but the insertion operation becomes very simple, which is O(1), and inserting nodes is the most frequent operation.
You can see that L588, previously traversed lookup in the new implementation e5bb668, simply becomes an abstract List 'append' operation.
https://github.com/joyent/node/issues/8160
Use a binary heap
A binary heap achieves a balance between insertion and lookup, which is consistent with the current implementation of libuv. For those interested, please refer to:
https://github.com/misterdjules/node/commits/fix-issue-8160-with-heap, based on v0.12.
Community-improved implementation
The implementation using an ordered linked list only uses one
unrefTimerto execute tasks, which saves memory but is difficult to achieve performance balance.The binary heap implementation is inferior to an unsorted linked list in normal connection scenarios.
Through evolution, the community-improved implementation uses a combination of hash tables and linked lists to trade space for time. In fact, it is an evolution of the time wheel algorithm.
Let's first take a look at the organization of the data structure:
The keys of
refedListsare the timeout durations, and the values are linked lists with the same timeout duration.unrefedListsis the same.
Let's compare the above implementation:
L128, get the list according to the key (timeout time), if it is not undefined, simply
appendit to the end, complexity O(1).L130-L141, if it is undefined, create a
TimersList, which contains a C timer to handle tasks in the linked list.listOnTimeoutalso becomes very simple, taking out the tasks in the linked list, complexity depends on the length of the linked list O(m), m < N.
The module uses a linked list to store all objects with the same timeout time. Each object stores the start time _idleStart and the timeout time _idleTimeout. The first object added to the linked list will always time out before the later added objects. When the first object completes its timeout processing, the next object's timeout time is recalculated to see if it has already timed out or how long it will take to time out. The previously created Timer object will be restarted and set with a new timeout time until all objects on the linked list have completed their timeout processing, at which point the Timer object will be closed.
Through this clever design, a Timer object is maximally reused, greatly improving the performance of the timer module.
Application of Timer in Node.js
Dynamically update the HTTP Date field cache
L230, the utcDate() function is used to dynamically update the HTTP Date field cache. The function constructs a new Date object and stores its UTC string representation in the dateCache variable. The function then enrolls itself in the timer module to be called again in 1 second, and marks itself as unrefed to avoid keeping the process open unnecessarily.
L34-L35, the dateCache value is reused for subsequent calls to utcDate() within 1 second.
L36-L37, the timer for utcDate() is reset to update the cache after 1 second has elapsed.
HTTP connection timeout control
The default timeout is this.timeout = 2 * 60 * 1000;, which is 120s. L313, the socket is destroyed if it times out.
Summary
The timer module in Node.js embodies many programming design principles:
Data structure abstraction
linkedlist.js abstracts the basic operations of linked lists.
Space-time tradeoff
Timers with the same timeout time are grouped instead of using a single
unrefTimer, reducing complexity to O(1).
Object reuse
Timers with the same timeout time share a C timer at the bottom.
80/20 rule
Optimize the performance of the main path.
Reference Documents
[1].https://github.com/nodejs/node/wiki/Optimizing-_unrefActive
[2].http://alinode.aliyun.com/blog/9
Last updated