Saturday, August 4, 2007

Lamenting ECN deployments

Explicit Congestion Control (ECN) has had lots of papers written about it. It has also been in various stages of deployment for almost 10 years now. I generally agree with those that feel it is a good thing. Beyond the scientific data, always important of course, at a gut level separating data loss from congestion notification is an obviously good thing to do - they are simply different things and the implicit overload TCP currently uses results in creating un-necessary overflows and over-conservative backoffs.

But the sad fact is that ECN just isn't a relevant to the real world Internet. If you can't become relevant in 8 years or so, it is probably time to try something else. For a long time the issue was getting over interop problems with NATs and Firewalls. Then there was the matter of getting widespread client and router deployments. Linux has had client support for a long time, other Unixes more recently, and Microsoft Vista is the first MS OS to include ECN support at all - but it ships disabled by default.

However, it seems that even as clients are catching up, the routing infrastructure still isn't playing along.

I took a sample from my home network to see if ECN was relevant at all to my computing life. The summary is that ECN is irrelevant in my data set.

My network runs a lot of Linux with ECN enabled, so if anything ECN will be over-represented on my network compared to the Internet at large. The sample covered
  • 8.5 days
  • 1,227,473 IP packets
  • 40,914 TCP flows
Of those 41K flows, almost 8 percent (3268) negotiated ECN on between peers. As I already mentioned, I suspect this pretty meager number is higher that the Internet at large.

8% - that's great and might make a real contribution. But without router support, those eligible flows won't actually use ECN in any meaningful sense. There are two signs of ECN actually taking root with an intermediate router:
  • The TCP ECN ECHO flag bit is set on an incoming packet.. this tells the receiver that an earlier packet it sent was marked by a router on its way to the destination. The peer is setting this flag in order to tell the original sender to slow down so that doesn't keep happening.
  • The IP header ECN codepoint is set to 3 on an incoming packet. This indicates to the receiver that the packet hit some congestion on the way and a router set this bit to mark that fact.
There were no ECN ECHO flags in the entire trace, except those set on SYN packets where it is used in order to negotiate endpoint knowledge of ECN. When appearing on a SYN it is not a sign of router support.

There were 87 packets (over 8 different TCP flows) with the ECN codepoint of 3 - normally indicating congestion marks added by a router. However, that is a bogus conclusion in this case and it does not appear that in 1.2 million packets I have a single one that was marked as congestion-influenced.

I can say conclusively that the 87 codepoint 3 packets are false positives because none of the 8 flows containing those packets were included in the 8% of flows that had successfully negotiated the use of ECN. The peer must have been using the codepoint to indicate something different entirely. Each flow was with a different host, though they were all SMTP MTA based in Europe (7 of them in Germany).

Sally Floyd says on her webpage:
"David Moore from CAIDA reports that in measurements at one link, 0.1% of the packets had the CE codepoint set. Either this codepoint was being used for some other purpose, or there is some deployment of ECN capability in routers as well as in TCP stacks."
My data at least hints that the hope that there is some deployment of ECN capability in routers is over-optimistic.

The final wrapup - my traces are characterized by meager (< 10%) client support no indication of router support at all.

I was inspired to take a look at this by the work of Bob Briscoe, who is championing a IETF BoF on Re-ECN, which is an attempt to make lemonade from lemons and re-invigorate the basic underlying technology.