Thursday, June 6, 2013

RTT - a lot more than time to first byte

Lots of interesting stuff in this paper: Understanding the latency benefits of multi-cloud webservice deployments.

One of the side angles I find interesting is that they find that even if you built an idealized fully replicated custom CDN using multiple datacenters on multiple clouds there are still huge swaths of Internet population that have browser to server RTTs over 100ms. (China, India, Argentina, Israel!)

RTT is the hidden driver in so much of internet performance and to whatever extent possible it needs to be quashed. Its so much more than time to first byte.

Bandwidth, CPU, even our ability to milk more data out of radio spectrum, all improve much more rapidly than RTT can.

Making RTT better is really a physical problem and is, in the general case at least, really really hard. Making things less reliant on RTT is a software problem and is a more normal kind of hard. Or at least it seems if you're a software developer.

RTT is not just time to the first byte of a web request. It's DNS. Its the TCP handshake. Its the SSL handshake. Its the time it takes to do a CORS check before an XHR. Its the time it takes to grow the congestion window (i.e. how fast you speed up from slow start) or confirm revocation status of the SSL certificate. Its the time it takes to do an HTTP redirect. And it is the time you're stalled while you recover from a packet loss. They're all some factor of RTT.

HTTP/2 and SPDY take the first step of dealing with this by creating a prioritized and muxxed protocol which tends to make it more often bandwidth bound than HTTP/1 is. But even that protocol still has lots of hidden RTT scaling algorithms in it.