Queues Do Not Smooth Load They Defer Pain

Queues exist to absorb burstiness, not to buffer sustained overload. A queue in front of a system that cannot keep up is not a solution it is a delayed failure with worse latency characteristics.

"Queues exist only to handle burstiness. If traffic flowed at a constant rate, every queue would be empty except the bottleneck, which would be full (bufferbloat), no matter its limit." Avery Pennarun

If arrival rate exceeds service rate for any sustained period, a queue does not help it just accumulates work that will eventually be dropped or served so late it is useless. The correct size for a queue is a statistically large burst (e.g., the 99th percentile of burst size), not "as big as possible." A queue that is too large creates bufferbloat: items sit in the queue for so long that by the time they are processed, the requester has given up. The work is wasted, and worse, it displaces newer, more valuable work.

The subtleties of queue management matter enormously. When multiplexing (combining multiple input streams into one queue), use backpressure so the upstream can make intelligent decisions about what to drop. When demultiplexing (splitting one queue into multiple output streams), drop packets at the output rather than using backpressure otherwise a single slow consumer can starve all the others. Tail drop (dropping the newest packet when full) is almost always the worst strategy; even simple head drop is better in many cases because later TCP ACKs encompass the information from earlier ones.

These principles apply far beyond network queues. Every thread pool, connection pool, request buffer, and async task scheduler is a queue subject to the same dynamics. "Implicit queues are everywhere" threads waiting on a lock, async tasks waiting for I/O completion, requests waiting for a database connection. The TIGER_STYLE principle of "put a limit on everything" exists precisely because unbounded queues are unbounded latency in disguise. Every queue must have a fixed upper bound, and exceeding that bound should be a clear signal to shed load, not silently accumulate it.

Takeaway: Size your queues for bursts, not for sustained overload an unbounded queue is just an out-of-memory error that has not happened yet.


See also: Latency Sneaks Up On You | Goodput Matters More Than Throughput | Tail Latency Dominates User Experience