CacheFly Blog Menu

CacheFly Blog

Anycast – “Think” before you talk – Part II

This post is Part II of Anycast – “Think” before you talk.

Directing Users to PoPs

Traditional methods to route users to the closest PoP rely on DNS-based geographic load balancing. DNS attempts to map users requests to the nearest PoP by giving out DNS records based on users latitude and longitude. Essentially, the IP address of the PoP is handed to the user based on the IP of the resolver, not the actual client IP address.

Benefits of DNS-based LB

One of the main advantages of DNS-based load balancing is control – administrators can direct any DNS request to any node. This is useful for traffic management purposes. It also offers flexibility regarding deployment as you don’t need to have common carriers and deal with the intricacies Anycast routing on the Internet.

However, there are plenty of tradeoffs as it does not out of the box handle topology changes very well, unlike the Anycast-based approach.

Challenges of DNS based LB

The majority feel it’s a very naive approach with plenty of shortcomings and suboptimal users placement. The issues mainly arise from PoP failover, resolver proximity, low TTL’s and timeouts.

Suboptimal decisions are made as the DNS mapping is based on the user’s name server IP, not the client’s actual IP address. This makes DNS based load balancing an inaccurate method for client proximity routing. The call comes from the client’s DNS server, not the actual client’s IP address. As a result, administrators can only ever optimise the performance metrics for the DNS resolver. This has changed in the last few years with EDNS client subnet, but it’s not fully implemented in all resolvers.

DNS doesn’t fail very gracefully when you are in the middle of a session and want to get rerouted to a different PoP. Users have to close their browsers and open them up to start working again.

DNS TTLS can cause lagging and performance issues. Whenever you decide that you need to change your answer you have to wait for DNS TTL to expire. During a failover scenario, the TTL of the response must be reached to change locations. Unfortunately, some applications hold on to these values for a long time. Setting a low TTL mitigates but the tradeoff is against performance as resolvers must frequently re-request the same DNS record.

The Anycast Approach

Instead of giving a different IP, Anycast is a mechanism that announces the same IP address from multiple locations. Anycast is nothing special – simply a route with multiple next hops. It’s not too different than Unicast; a NLRI object has multiple next hops instead of one. All the magic is done when you deliver the packet not the underlying network transporting the packet.

When you advertise multiple destinations, the shortest path is chosen based on the user location. Therefore, traffic organically lands where it should do as opposed to direct control based on GEO IP. Anycast does not rely on a stale Geo IP database and performance rests on the natural flow of the Internet.

The resolver IP is not used, instead the client IP is used for anycast routing. This subtle difference offers a more accurate view of where the users are located. The users can use whatever resolver they want and will still have the same assignment. As a result, the client’s DNS server is trivial with whatever the question is, the answer will be the same.

With an Anycast design, there are trade-offs between performance and stability. Anycast works best with metro or regionally based level design and with single PoP per location deployments. Multiple PoPs per location might run you into some problems. As a general best practice, the more physical space you have between your PoP the more stable the overall architecture will be.

Anycast Organic Traffic

Natively, Anycast is natively not load aware. Large volumes of inbound traffic could potentially saturate a PoP. While also true for Unicast traffic, DNS-based routing offers better control for PoP placement as you can hand out specific IP blocks for specific locations.

The DNS response may provide a suboptimal response, but it still represents a better level of supervision for traffic management purposes. The Anycast approach to PoP placement will have organic traffic naturally flowing to each PoP location; you can’t control this. Some control was given up moving from traditional DNS-based routing to a TCP-anycast CDN. So what’s the best line of action to take under these circumstances? Should you oversubscribe each PoP to account for the lack of control?

First and foremost when it happens, you need to be aware of it. It’s not acceptable not to be aware of a flood of traffic entering your network. The right monitoring tools need to be in place along with a responsive and active monitoring team. Much of the reason for large inbound flows happens upstream. For example, a provider breaks something. So it will happen, it’s just a matter of time. The best way to deal with it, is through active monitoring and preparation.

CacheFly has the experience and monitoring in place to detect and mitigate large volumes of inbound traffic. The network architecture consists of private connections between all PoP locations streamlining the shedding of traffic to undersubscribed PoPs as the need arises. In the event of high inbound traffic flows, CacheFly’s proactive monitoring and intelligent network design shifts traffic between locations mitigating the effects of uneven traffic flows due to Anycast design.

Benefits of Anycast

Anycast is deemed to fail quicker than DNS, has better performance and simpler to operate. Anycast doesn’t suffer from any of the DNS correlation issues, and it doesn’t matter which DNS server you came from. The client takes the fastest path from its locations as opposed to the fastest path where the DNS resolver is.

Anycast is a simple, less complex way for user assignment. You’re pushing the complexity and responsibility to the Interior Gateway Protocol (IGP) of the upstream provider, relying on the natural forwarding of the Internet to bring users to the closest PoP.

While with an Anycast design the next time you click a link on a page or anytime your browser goes out and refreshes content you are back on your way to a new POP. Anycast is faster as traffic shift can happen much quicker and you don’t have to lower users performance by keeping a low DNS TTL.

Upon network failure, Anycast fails far more quickly in scenarios to that of Unicast. If you are having routing issues between location X and location Y. With Anycast a TCP RST is received and the client works immediately to the new location. Without Anycast, clients will continually attempt to reach the server in location X but as it’s not available, the client is continually stuck in a routing loop between location, until either;

a ) The providers converge, but until then, users are waiting, timing out, reloading and timing out over and over again.

b) If location X is down, the client has to wait for the DNS GEO to realise and offer a new IP address. Clients either need to timeout the IP in the application or resolver and potentially close and open the web browser for things to start working again.

While on the other hand Anycast broke quickly and got back working again rapidly. With outages, traffic is seamlessly routed to the next best location, without requiring browser restarts, a type of convergence not possible with traditional DNS solutions.


Anycast enables the use of high TTL as the actual IP address of endpoints never changes. This allows resolver to cache a response increasing overall end user experience and network efficiency.

It’s also a great tool in a DDoS mitigation solution. With Botnet armies reaching a Terabyte-scale attack, the only cost effective way is to distribute your architecture, naturally absorbing the attack with an Anycast network.

Everything is debatable

However, Anycast requires some form of stickiness, so flows get the same forwarding treatment. As a result, per packet load balancing can break Anycast. However, per packet load balancing is rarely seen these days, but there is a chance it exists somewhere in a far-flung ISP. Generally speaking, we are designing better networks these days.

TCP/IP uses a different protocol for out of band signalling. As a result, it may have different forwarding treatments and massages (Path MTU discovery), and may not reach the intended receiver. Technically this is still an issue but not widely a problem on TCP/Anycast networks.

Anycast endpoint selection is based on hop count number. That does not mean it’s routing based on lowest latency or best performing links. Fewer hops do not mean lower latency. Some destinations may be one hop away, but that could be a high latency intercontinental link. More than often traffic doesn’t have to traverse intercontinental links to reach its final destination. With intelligent PoP placement, content is placed close to the user in the specified regions.

Anycast does take control away from the administrator to the hands of the Internet. As user requests organically land at the closest PoP; the strict supervision of where users lands are removed, potentially leading to capacity management issues at each edge location. As already discussed, this is overcome with experienced monitoring teams. Another reason why you shouldn’t go with a DIY CDN.


People overestimate how unreliable the Internet is regarding broad events, underestimate the impact of those on Unicast and overestimate the impact on Anycast. The unreliability of the Internet is built into its design. The Internet is designed to fail! However, we assume under a failure if we are using TCP/Anycast and application terminates at the wrong place, the world stops, and everything else breaks.

If for an intermediate failure or misconfiguration event, an HTTP SYN destined to Server X lands on Server Y, and as this server does not have an active TCP session, it will as it should send an RST back to the client. But if your application doesn’t handle network interactions very well you really shouldn’t be running it on the Internet.

Networks are built to fail, and they will fail! If you are looking for 100% network reliability and the application can’t handle failures, then you should maybe look to rebuild the application.

This guest contribution is written by Matt Conran, Network Architect for Network Insight. Matt Conran has more than 17 years of networking industry with entrepreneurial start-ups, government organisations and others. He is a lead Network Architect and successfully delivered major global green field service provider and data centre networks.


Image Credit: Pixabay

Anycast – “Think” before you talk

Part I


How you experience the performance of an application boils down to where you stand on the Internet. Global reachability means everyone can reach everyone, but not everyone gets the same level of service. The map of the Internet has different perspectives for individual users and proximity plays a significant role in users’ experience.

Why can’t everyone everywhere have the same level of service and why do we need optimisations in the first place? Mainly, it boils down to the protocols used for reachability and old congestion management techniques. The Internet comprises of old protocols that were never designed with performance in mind. As a result, there are certain paths we must take to overcome its shortcomings.

Performances Challenges

The primary challenge arises from how Transmission Control Protocol (TCP) operates under both normal conditions and stress once the inbuilt congestion control mechanisms kick-in.

Depending on configuration, it could take anything up to 3  – 5 RTT to send data back to the client. Also, the congestion control mechanisms of TCP only allows 10 segments to be sent with 1 RTT, increasing after that.

Unfortunately, this is the basis for congestion control on the Internet, which hinders application performance, especially for those with large payloads.

Help at Hand

There are a couple of things we can engineer to help this. The main one is to move content closer to users by rolling out edge nodes (PoPs) that proxy requests and cache static content. Edge nodes increase client side performance as all connections terminate close to the users. In simplistic terms, the closer you are to the content the better the performance.

Other engineering tricks involve tweaking how TCP operates. This works to a degree, making bad less bad or good better but it doesn’t necessarily shift a bad connection to a good one.

The correct peering and transits play a role. Careful selection based on population and connectivity density are key factors. Essentially, optimum performance is down to many factors used together in conjunction with reducing the latency as much as possible.

PoP locations, peerings, transits with all available optimisations are only part of the puzzle. The next big question is how do we get users to the right PoP location? And with failure events how efficiently do users fail to alternative locations?

In theory, we have two main options for PoP selection;

a) Traditional DNS based load balancing,
b) Anycast.

Before we address these mechanisms, lets dig our understanding deeper into some key CDN technologies to understand better which network type is optimal.

Initially, we had web servers in central locations servicing content locally. As users became more dispersed so did the need for content. You cannot have the same stamp for the entire world! There needs to be some network segregation which gives rise to edge nodes or PoPs placed close to the user.  

PoPs solve

Employing a PoP decrease the connection time as we are terminating the connection at the local PoP. When the client sends an HTTP GET request the PoP sends to the data centre over an existing hot TCP connection. The local PoP and central data centre are continually taking, so the congestion control windows are high, allowing even a 1MB of data to be sent in one RTT. This greatly improves application performance and the world without PoPs would be a pretty slow one.



Selecting the right locations for PoP infrastructure plays an important role in overall network performance and user experience. The most important site selection criteria are to go where the eyeball networks are. You should always try to maximise the number of eyeball networks you are close to when you roll out a PoP. As a result, two fundamental aspects come to play – both physical and topological distance.

Well advanced countries, have well-advanced strategies for peering while others are not so lucky with less peering diversity due to size or government control. An optimum design has large population centers with high population and connectivity densities. With power and space being a secondary concern, diverse connectivity is King when it comes to selecting the right PoP location.

New Architectures

If you were to build a Content Delivery Network Ten years ago, the design would consist of heavy physical load balancers and separate appliance devices to terminate Secure Sockets Layer (SSL). The current best practice architecture has moved away from this and it’s now all about lots of RAM, SSD and high-performance CPU piled into compute nodes. Modern CPU’s are just as good at calculating SSL and it’s cleaner to terminate everything at a server level rather than terminate on costly dedicated appliances.

CacheFly pushes network complexities to their high performing servers and run equal cost multipath (ECMP) right to the host level. Pushing complexity to the edge of the network is the only way to scale and reduce central state. ECMP right down to the host, gives you a routerless design and gets rid of centralised load balancers, allowing to load balance incoming requests in hardware on the Layer 3 switch and then perform the TCP magic on the host.

CacheFly operates a Route Reflector design consisting of iBGP internally and eBGP to the WAN.

Forget about State

ECMP designs are not concerned with scaling an appliance that has lots of state. Devices with state are always hard to scale and load balancing with pure IP is so much easier. It allows you do to the inbound load balancing in hardware without the high-costs and operational complexities of multiple load balancer and high cost routers. With the new architectures, everything looks like an IP Packet and all switches forward this in hardware. Also, there usually needs to be two appliances for redundancy and also some additional spares in stock, just in case. Costly physical appliances sitting idle in a warehouse is good for no one.

We already touched on the methods to get clients to the PoP both traditional DNS based load balancing and Anycast. Anycast is now deemed a superior option but in the past has met some barriers. Anycast has been popular in the UDP world and now offers the same benefits to TCP-based application. But there has been some barriers to adoption mainly down to inaccurate information and lack of testing.

Barriers to TCP Anycast

The biggest problem for TCP/Anycast is not route convergence and application timeouts; it’s that most people think it doesn’t work. People believe that they know without knowing the facts or putting anything into practise to get those facts.

If you have haven’t tested, then you shouldn’t talk or if you have used it and experienced problems, let’s talk. People think that routes are going to converge quickly and always bounce between multiple locations, causing TCP resets. This doesn’t happen as much as you think but it’s much worse when Anycast is not used.

There is a perception that the Internet end-to-end, is an awful place. While there are many challenges, it’s not as bad you might think, especially if the application is built correctly. The Internet is never fully converged, but is this a major problem? If we have deployed an Anycast network how often would the Anycast IP from a given PoP change? – almost never.

The Internet may not have a steady state but what does change is the 1) association of prefix to Autonomous System (AS) and 2) peering between the AS. Two factors that may lead to best path selection. As a result, we need reliable peering relationships, but this is nothing to do with the Anycast Unicast debate.

Building better Networks

Firstly, we need to agree there is no global licence to rule to network designing and the creative ART of networking comes to play with all Service Provider designs. While SP networks offer similar connectivity goals, each and every SP network is configured and designed differently. Some with per-packet load balancing, but most with not. But we as a network design community are rolling out better networks.

There will always be the individual ART to network design unless we fully automate the entire process from design to device configurations which will not happen on a global scale anytime soon. There are so many ways and approaches to network design, but as a consensus, we are building and operating better networks. The modern Internet, a network that never fully converges, is overall pretty stable.

Nowadays, we are building better networks. We are forced to do so as networks are crucial to service delivery. If the network is down or not performing adequately, the services that run on top are useless. The pressure has forced engineers to design with a business orientated approach to networking, with the introduction of automation as an integral part to overlay network stability.

New Tools

In the past, we had primitive debugging and troubleshooting tools; PING and Traceroute most widely used. Both are crude ways to measure performance and only tell administrators if something is “really” broken. Today, we have an entirely new range of telemetry systems at our disposal that inform administrators where the asymmetrical paths are and overlay network performance based on numerous Real User Monitoring (RUM) metrics.

Continue Reading Anycast – “Think” before you talk – Part II

This guest contribution is written by Matt Conran, Network Architect for Network Insight. Matt Conran has more than 17 years of networking industry with entrepreneurial start-ups, government organisations and others. He is a lead Network Architect and successfully delivered major global green field service provider and data centre networks.

Image Credit: Pexels