Tail Latency Dominates User Experience
In fan-out architectures, the overall request latency is determined by the slowest component. As the number of parallel calls grows, the probability of hitting a tail latency outlier approaches certainty.
"You can horizontally scale your way toward any throughput target, but there is no easy fix for latency spikes." Yao Yue
When a single user request fans out to 100 backend services in parallel, the user waits for the slowest one. If each backend has a 1% chance of being slow, the probability that at least one is slow is 1 0.99^100 = 63%. At 1000 services, it is 99.996%. Tail latency that is invisible at the component level becomes the dominant factor in user-perceived performance at scale. This is why Twitter maintains a strict SLO of p999 under 5 milliseconds for cache operations at their scale, even 99.9th percentile outliers are encountered constantly.
The sources of tail latency are varied and often surprising. Shared resources create contention: a background garbage collection pause, an OS scheduling decision, a burst of network interrupts, cache pollution from a co-located workload. These are not bugs in your code they are emergent properties of running on shared infrastructure. The tail is often determined by system factors outside the application's control, including suboptimal CPU scheduling, interrupt bursts, and CPU migration between cores.
Effective strategies work at multiple levels. At the infrastructure level, separate control plane from data plane processing, use dedicated queues and CPU affinity to isolate latency-sensitive work, and batch operations to reduce context-switch overhead. At the application level, use partial-order over total-order (quorum reads instead of waiting for all replicas) to get best-K-of-N performance. Hedge requests by issuing redundant calls and taking the first response. And recognize that predictability is more valuable than peak throughput a system that reliably serves p999 under 5ms is more useful than one that averages 1ms but occasionally spikes to 500ms.
Takeaway: Design for the tail, not the median at scale, your slowest component is your user's experience.
See also: Latency Sneaks Up On You | Goodput Matters More Than Throughput | The Fundamental Mechanism of Scaling Is Partitioning