In 2006, at NANOG 37, I joined join Barrett Lyon and Todd Underwood in the first public sharing of data for commercially operating TCP Anycast platforms. We felt our data from operating our platforms flew in the face of the conventional wisdom at the time — everyone was spreading a lot of FUD.
Our vision the day was that *all* services on the internet, someday, should sit behind Anycast. At the time, this was ambitious and seemed totally unrealistic, but with the announcement of AWS Global Accelerator, this seems like a good time to reflect on how far we’ve come.
In truth — really really far! In the 12 years since our presentation, virtually every venture funded CDN has been operating services behind TCP Anycast, and now with Global Accelerator — 2 of the “big 3” cloud providers now have anycast services in production for customers, and with Microsoft Front Door being in preview, 3 out of 3 is coming any day now.
Even before these new cloud offerings— every user on the internet pulls content from TCP Anycast. This is awesome, and was just a pipe dream in 2006.
With the good, however, comes the bad. There’s still a lot of FUD! All the same FUD from 2006! There’s been some recent published data from respected engineering groups making claims w/o supporting data about unreliability or a magical number of PoP’s that make it stable, or that somehow unicast reconverges better than anycast in the BGP table, or that you can’t figure out how to drain a PoP.
12 years later: Still. All. The. FUD. The *same* FUD.
Dear engineers of 2018: It’s totally okay to say “hey this is what we see and we can’t figure out why, so we tried something else”. I get it, really, it’s not the easiest thing in the world to get working globally. (repeating from 2006) Please stop saying it doesn’t work.
Just because you can’t figure out a problem though, doesn’t mean it’s true for everyone else! Or that your implementation failures are universal.
I promise you all the commercial Anycast offerings are not “unreliable in the high percentiles” and AWS, Google and Microsoft aren’t launching a service that will be sub-optimal because “people can change localpref in their networks and therefore will route things in ways that hurt performance, and they can’t influence inbound”. FUD.
Even AWS engineers were quietly tweeting about overcoming the “challenges” that exist with Anycast to make Global Accelerator work, and that they just now, in 2018, managed to overcome these/come up with solutions— these were the same “challenges” people were citing in 2006. We’ve been overcoming them since 2002. They are solved.
Of course, there’s some legitimate problems. Draining connections from a POP, or from servers within a POP is hard. Lots of things are hard. But hard things can be overcome — Everyone running Anycast in production have tooling that works for us, and perhaps as a community we need to share more about operating at scale/in production.
In the next few months I’ll be posting a series of articles sharing some of workflows and tooling for managing Anycast — including how we gracefully drain traffic, how we quickly catch traffic shifting in the wild, and much more.
In the mean time, if you’re trying to deploy TCP Anycast and you’re running into problems, don’t give up — Ask someone for help!
There are solutions, it works really well, we promise — True in 2002, true today.