Sign In
CacheFly Blog Menu

CacheFly Blog

Anycast – “Think” before you talk – Part II

This post is Part II of Anycast – “Think” before you talk.

Directing Users to PoPs

Traditional methods to route users to the closest PoP rely on DNS-based geographic load balancing. DNS attempts to map users requests to the nearest PoP by giving out DNS records based on users latitude and longitude. Essentially, the IP address of the PoP is handed to the user based on the IP of the resolver, not the actual client IP address.

Benefits of DNS-based LB

One of the main advantages of DNS-based load balancing is control – administrators can direct any DNS request to any node. This is useful for traffic management purposes. It also offers flexibility regarding deployment as you don’t need to have common carriers and deal with the intricacies Anycast routing on the Internet.

However, there are plenty of tradeoffs as it does not out of the box handle topology changes very well, unlike the Anycast-based approach.

Challenges of DNS based LB

The majority feel it’s a very naive approach with plenty of shortcomings and suboptimal users placement. The issues mainly arise from PoP failover, resolver proximity, low TTL’s and timeouts.

Suboptimal decisions are made as the DNS mapping is based on the user’s name server IP, not the client’s actual IP address. This makes DNS based load balancing an inaccurate method for client proximity routing. The call comes from the client’s DNS server, not the actual client’s IP address. As a result, administrators can only ever optimise the performance metrics for the DNS resolver. This has changed in the last few years with EDNS client subnet, but it’s not fully implemented in all resolvers.

DNS doesn’t fail very gracefully when you are in the middle of a session and want to get rerouted to a different PoP. Users have to close their browsers and open them up to start working again.

DNS TTLS can cause lagging and performance issues. Whenever you decide that you need to change your answer you have to wait for DNS TTL to expire. During a failover scenario, the TTL of the response must be reached to change locations. Unfortunately, some applications hold on to these values for a long time. Setting a low TTL mitigates but the tradeoff is against performance as resolvers must frequently re-request the same DNS record.

The Anycast Approach

Instead of giving a different IP, Anycast is a mechanism that announces the same IP address from multiple locations. Anycast is nothing special – simply a route with multiple next hops. It’s not too different than Unicast; a NLRI object has multiple next hops instead of one. All the magic is done when you deliver the packet not the underlying network transporting the packet.

When you advertise multiple destinations, the shortest path is chosen based on the user location. Therefore, traffic organically lands where it should do as opposed to direct control based on GEO IP. Anycast does not rely on a stale Geo IP database and performance rests on the natural flow of the Internet.

The resolver IP is not used, instead the client IP is used for anycast routing. This subtle difference offers a more accurate view of where the users are located. The users can use whatever resolver they want and will still have the same assignment. As a result, the client’s DNS server is trivial with whatever the question is, the answer will be the same.

With an Anycast design, there are trade-offs between performance and stability. Anycast works best with metro or regionally based level design and with single PoP per location deployments. Multiple PoPs per location might run you into some problems. As a general best practice, the more physical space you have between your PoP the more stable the overall architecture will be.

Anycast Organic Traffic

Natively, Anycast is natively not load aware. Large volumes of inbound traffic could potentially saturate a PoP. While also true for Unicast traffic, DNS-based routing offers better control for PoP placement as you can hand out specific IP blocks for specific locations.

The DNS response may provide a suboptimal response, but it still represents a better level of supervision for traffic management purposes. The Anycast approach to PoP placement will have organic traffic naturally flowing to each PoP location; you can’t control this. Some control was given up moving from traditional DNS-based routing to a TCP-anycast CDN. So what’s the best line of action to take under these circumstances? Should you oversubscribe each PoP to account for the lack of control?

First and foremost when it happens, you need to be aware of it. It’s not acceptable not to be aware of a flood of traffic entering your network. The right monitoring tools need to be in place along with a responsive and active monitoring team. Much of the reason for large inbound flows happens upstream. For example, a provider breaks something. So it will happen, it’s just a matter of time. The best way to deal with it, is through active monitoring and preparation.

CacheFly has the experience and monitoring in place to detect and mitigate large volumes of inbound traffic. The network architecture consists of private connections between all PoP locations streamlining the shedding of traffic to undersubscribed PoPs as the need arises. In the event of high inbound traffic flows, CacheFly’s proactive monitoring and intelligent network design shifts traffic between locations mitigating the effects of uneven traffic flows due to Anycast design.

Benefits of Anycast

Anycast is deemed to fail quicker than DNS, has better performance and simpler to operate. Anycast doesn’t suffer from any of the DNS correlation issues, and it doesn’t matter which DNS server you came from. The client takes the fastest path from its locations as opposed to the fastest path where the DNS resolver is.

Anycast is a simple, less complex way for user assignment. You’re pushing the complexity and responsibility to the Interior Gateway Protocol (IGP) of the upstream provider, relying on the natural forwarding of the Internet to bring users to the closest PoP.

While with an Anycast design the next time you click a link on a page or anytime your browser goes out and refreshes content you are back on your way to a new POP. Anycast is faster as traffic shift can happen much quicker and you don’t have to lower users performance by keeping a low DNS TTL.

Upon network failure, Anycast fails far more quickly in scenarios to that of Unicast. If you are having routing issues between location X and location Y. With Anycast a TCP RST is received and the client works immediately to the new location. Without Anycast, clients will continually attempt to reach the server in location X but as it’s not available, the client is continually stuck in a routing loop between location, until either;

a ) The providers converge, but until then, users are waiting, timing out, reloading and timing out over and over again.

b) If location X is down, the client has to wait for the DNS GEO to realise and offer a new IP address. Clients either need to timeout the IP in the application or resolver and potentially close and open the web browser for things to start working again.

While on the other hand Anycast broke quickly and got back working again rapidly. With outages, traffic is seamlessly routed to the next best location, without requiring browser restarts, a type of convergence not possible with traditional DNS solutions.

anycast

Anycast enables the use of high TTL as the actual IP address of endpoints never changes. This allows resolver to cache a response increasing overall end user experience and network efficiency.

It’s also a great tool in a DDoS mitigation solution. With Botnet armies reaching a Terabyte-scale attack, the only cost effective way is to distribute your architecture, naturally absorbing the attack with an Anycast network.

Everything is debatable

However, Anycast requires some form of stickiness, so flows get the same forwarding treatment. As a result, per packet load balancing can break Anycast. However, per packet load balancing is rarely seen these days, but there is a chance it exists somewhere in a far-flung ISP. Generally speaking, we are designing better networks these days.

TCP/IP uses a different protocol for out of band signalling. As a result, it may have different forwarding treatments and massages (Path MTU discovery), and may not reach the intended receiver. Technically this is still an issue but not widely a problem on TCP/Anycast networks.

Anycast endpoint selection is based on hop count number. That does not mean it’s routing based on lowest latency or best performing links. Fewer hops do not mean lower latency. Some destinations may be one hop away, but that could be a high latency intercontinental link. More than often traffic doesn’t have to traverse intercontinental links to reach its final destination. With intelligent PoP placement, content is placed close to the user in the specified regions.

Anycast does take control away from the administrator to the hands of the Internet. As user requests organically land at the closest PoP; the strict supervision of where users lands are removed, potentially leading to capacity management issues at each edge location. As already discussed, this is overcome with experienced monitoring teams. Another reason why you shouldn’t go with a DIY CDN.

Summary

People overestimate how unreliable the Internet is regarding broad events, underestimate the impact of those on Unicast and overestimate the impact on Anycast. The unreliability of the Internet is built into its design. The Internet is designed to fail! However, we assume under a failure if we are using TCP/Anycast and application terminates at the wrong place, the world stops, and everything else breaks.

If for an intermediate failure or misconfiguration event, an HTTP SYN destined to Server X lands on Server Y, and as this server does not have an active TCP session, it will as it should send an RST back to the client. But if your application doesn’t handle network interactions very well you really shouldn’t be running it on the Internet.

Networks are built to fail, and they will fail! If you are looking for 100% network reliability and the application can’t handle failures, then you should maybe look to rebuild the application.


This guest contribution is written by Matt Conran, Network Architect for Network Insight. Matt Conran has more than 17 years of networking industry with entrepreneurial start-ups, government organisations and others. He is a lead Network Architect and successfully delivered major global green field service provider and data centre networks.

 

Image Credit: Pixabay

Anycast – “Think” before you talk

Part I

Introduction

How you experience the performance of an application boils down to where you stand on the Internet. Global reachability means everyone can reach everyone, but not everyone gets the same level of service. The map of the Internet has different perspectives for individual users and proximity plays a significant role in users’ experience.

Why can’t everyone everywhere have the same level of service and why do we need optimisations in the first place? Mainly, it boils down to the protocols used for reachability and old congestion management techniques. The Internet comprises of old protocols that were never designed with performance in mind. As a result, there are certain paths we must take to overcome its shortcomings.

Performances Challenges

The primary challenge arises from how Transmission Control Protocol (TCP) operates under both normal conditions and stress once the inbuilt congestion control mechanisms kick-in.

Depending on configuration, it could take anything up to 3  – 5 RTT to send data back to the client. Also, the congestion control mechanisms of TCP only allows 10 segments to be sent with 1 RTT, increasing after that.

Unfortunately, this is the basis for congestion control on the Internet, which hinders application performance, especially for those with large payloads.

Help at Hand

There are a couple of things we can engineer to help this. The main one is to move content closer to users by rolling out edge nodes (PoPs) that proxy requests and cache static content. Edge nodes increase client side performance as all connections terminate close to the users. In simplistic terms, the closer you are to the content the better the performance.

Other engineering tricks involve tweaking how TCP operates. This works to a degree, making bad less bad or good better but it doesn’t necessarily shift a bad connection to a good one.

The correct peering and transits play a role. Careful selection based on population and connectivity density are key factors. Essentially, optimum performance is down to many factors used together in conjunction with reducing the latency as much as possible.

PoP locations, peerings, transits with all available optimisations are only part of the puzzle. The next big question is how do we get users to the right PoP location? And with failure events how efficiently do users fail to alternative locations?

In theory, we have two main options for PoP selection;

a) Traditional DNS based load balancing,
b) Anycast.

Before we address these mechanisms, lets dig our understanding deeper into some key CDN technologies to understand better which network type is optimal.

Initially, we had web servers in central locations servicing content locally. As users became more dispersed so did the need for content. You cannot have the same stamp for the entire world! There needs to be some network segregation which gives rise to edge nodes or PoPs placed close to the user.  


What
PoPs solve

Employing a PoP decrease the connection time as we are terminating the connection at the local PoP. When the client sends an HTTP GET request the PoP sends to the data centre over an existing hot TCP connection. The local PoP and central data centre are continually taking, so the congestion control windows are high, allowing even a 1MB of data to be sent in one RTT. This greatly improves application performance and the world without PoPs would be a pretty slow one.

anycast

 

Selecting the right locations for PoP infrastructure plays an important role in overall network performance and user experience. The most important site selection criteria are to go where the eyeball networks are. You should always try to maximise the number of eyeball networks you are close to when you roll out a PoP. As a result, two fundamental aspects come to play – both physical and topological distance.

Well advanced countries, have well-advanced strategies for peering while others are not so lucky with less peering diversity due to size or government control. An optimum design has large population centers with high population and connectivity densities. With power and space being a secondary concern, diverse connectivity is King when it comes to selecting the right PoP location.


New Architectures

If you were to build a Content Delivery Network Ten years ago, the design would consist of heavy physical load balancers and separate appliance devices to terminate Secure Sockets Layer (SSL). The current best practice architecture has moved away from this and it’s now all about lots of RAM, SSD and high-performance CPU piled into compute nodes. Modern CPU’s are just as good at calculating SSL and it’s cleaner to terminate everything at a server level rather than terminate on costly dedicated appliances.

CacheFly pushes network complexities to their high performing servers and run equal cost multipath (ECMP) right to the host level. Pushing complexity to the edge of the network is the only way to scale and reduce central state. ECMP right down to the host, gives you a routerless design and gets rid of centralised load balancers, allowing to load balance incoming requests in hardware on the Layer 3 switch and then perform the TCP magic on the host.

CacheFly operates a Route Reflector design consisting of iBGP internally and eBGP to the WAN.


Forget about State

ECMP designs are not concerned with scaling an appliance that has lots of state. Devices with state are always hard to scale and load balancing with pure IP is so much easier. It allows you do to the inbound load balancing in hardware without the high-costs and operational complexities of multiple load balancer and high cost routers. With the new architectures, everything looks like an IP Packet and all switches forward this in hardware. Also, there usually needs to be two appliances for redundancy and also some additional spares in stock, just in case. Costly physical appliances sitting idle in a warehouse is good for no one.

We already touched on the methods to get clients to the PoP both traditional DNS based load balancing and Anycast. Anycast is now deemed a superior option but in the past has met some barriers. Anycast has been popular in the UDP world and now offers the same benefits to TCP-based application. But there has been some barriers to adoption mainly down to inaccurate information and lack of testing.

Barriers to TCP Anycast

The biggest problem for TCP/Anycast is not route convergence and application timeouts; it’s that most people think it doesn’t work. People believe that they know without knowing the facts or putting anything into practise to get those facts.

If you have haven’t tested, then you shouldn’t talk or if you have used it and experienced problems, let’s talk. People think that routes are going to converge quickly and always bounce between multiple locations, causing TCP resets. This doesn’t happen as much as you think but it’s much worse when Anycast is not used.

There is a perception that the Internet end-to-end, is an awful place. While there are many challenges, it’s not as bad you might think, especially if the application is built correctly. The Internet is never fully converged, but is this a major problem? If we have deployed an Anycast network how often would the Anycast IP from a given PoP change? – almost never.

The Internet may not have a steady state but what does change is the 1) association of prefix to Autonomous System (AS) and 2) peering between the AS. Two factors that may lead to best path selection. As a result, we need reliable peering relationships, but this is nothing to do with the Anycast Unicast debate.

Building better Networks

Firstly, we need to agree there is no global licence to rule to network designing and the creative ART of networking comes to play with all Service Provider designs. While SP networks offer similar connectivity goals, each and every SP network is configured and designed differently. Some with per-packet load balancing, but most with not. But we as a network design community are rolling out better networks.

There will always be the individual ART to network design unless we fully automate the entire process from design to device configurations which will not happen on a global scale anytime soon. There are so many ways and approaches to network design, but as a consensus, we are building and operating better networks. The modern Internet, a network that never fully converges, is overall pretty stable.

Nowadays, we are building better networks. We are forced to do so as networks are crucial to service delivery. If the network is down or not performing adequately, the services that run on top are useless. The pressure has forced engineers to design with a business orientated approach to networking, with the introduction of automation as an integral part to overlay network stability.


New Tools

In the past, we had primitive debugging and troubleshooting tools; PING and Traceroute most widely used. Both are crude ways to measure performance and only tell administrators if something is “really” broken. Today, we have an entirely new range of telemetry systems at our disposal that inform administrators where the asymmetrical paths are and overlay network performance based on numerous Real User Monitoring (RUM) metrics.

Continue Reading Anycast – “Think” before you talk – Part II


This guest contribution is written by Matt Conran, Network Architect for Network Insight. Matt Conran has more than 17 years of networking industry with entrepreneurial start-ups, government organisations and others. He is a lead Network Architect and successfully delivered major global green field service provider and data centre networks.

Image Credit: Pexels

Tech companies need to optimize software delivery and downloads: How CDNs can help

The halcyon days where software programs were delivered on compact discs and DVD media are gradually fading away—remember the floppy disk? However, despite this evolution, software is not actually getting any smaller. Embedded video, audio, and other media content are making the install size of most programs increase at an almost exponential rate.

Tech companies that provide software to customers are faced with a dilemma. They need to provide web-based delivery of their software programs, as well as any updates, while maintaining the file size for this software. Only the best content delivery networks (CDN) solve this dilemma.

Leveraging a CDN for fast web-based software delivery

Tech companies that offer online software to their customers need to ensure that clients are able to download the software in a timely manner. Slow or interrupted downloads are an easy way to make a customer unhappy, and you can be sure that word of these issues will spread quickly and discourage potential customers.

Leveraging a CDN for software delivery is a more practical way to improve download performance at a minimal cost than trying to make on-premise technology investments at your own company. Letting the experts in the field work their magic allows your firm to concentrate on writing great software.

What a CDN provides to software companies

In addition to 10 times the improved download times, CDNs also offer other tangible benefits to software providers. Unlimited scalability ensures that as your customer base grows, your download performance doesn’t suffer—no matter the size of your software. Security is handled easily using URL/referrer blocking as well as tight control to subscription-based content.

It is vital for tech companies to put their best foot forward when offering online downloads. Partnering with a quality CDN like CacheFly is a smart move to ensure your customer base remains happy and continues to grow.

Sign up for a free test drive and see why thousands of software providers trust CacheFly for faster downloads.

Photo credit: Wikimedia Commons

Measuring Throughput Performance: DNS vs. TCP Anycast Routing

Since 2002, when we pioneered the first TCP-anycast CDN, CacheFly has always used throughput and availability as the two metrics that drive us as a company.

Many CDNs rely on DNS-based routing methods; however, there are several differences between the two , which directly translates to throughput, the real indicator of a CDN’s performance as well as availability.  Since customers frequently ask us what the differences are between the technologies, here’s a quick overview discussing the benefits of TCP-anycast routing over DNS.

Cedexis-Graph

Cedexis measures highest average CDN throughput performance of a 100kb file in the U.S. in April, 2014.


Traditional DNS CDN

DNS-based routing is known as the ‘traditional’, or old-school way of doing global traffic management. DNS routing works by locating the customer’s DNS server and trying to make an informed decision on where the DNS server is located, and which CDN location is closest to that DNS server, and returns that IP address. This operated under the assumption that the physical location, as well as network topology, of the DNS server is a good approximation of both of those values for the actual client behind the DNS server. This is a big leap of faith* (especially in the age of services like OpenDNS and Google Public carrying a significant amount of the worlds DNS traffic).  As an example, a certain DSL providers maintains their DNS infrastructure in Atlanta, yet almost 60% of their subscribers are in Southern California – this results in 100% of the traffic behind those DNS servers being served from Atlanta. Not good.

* The edns-client-subnet extension (which CacheFly supports and uses with OpenDNS and Google, among others) ‘fixes’ this problem, however, the DNS based CDNs are struggling with the transition as their systems were designed to map nameservers to POPs, and the edns-client-subnet solution effectively requires them to now be able to map the entire routing table, which is a much bigger challenge to properly monitor performance/availability on a prefix-by-prefix basis in real-time.

More importantly, availability and failover is a large challenge when using a DNS solution, as the TTL of the response must be reached to change locations, and even then, some clients cache the first response and users have to actually restart their browser or client to get a new IP if a POP goes offline (assuming the CDN is even aware that it became unreachable). This can be mitigated by choosing a low TTL which many providers do; however, in turn, this delays performance as resolvers must frequently re-request the same DNS record, delaying the first connection for hostnames that should be in the resolvers cache.

TCP-Anycast CDN

Our TCP-anycast method leverages the best of both worlds – using both DNS and the actual core routing table of the Internet (BGP) to intelligently take client requests and serve from where the *client* is located on the Internet and lets the providers internal metrics find the topographically closest CDN server. This is a huge win for both performance and availability. With anycast, the actual IP address of endpoints never changes, which means we can use a high TTL to ensure a great end-user experience by letting resolvers cache a response. In the event of a provider outage, or if we need to take a POP offline for maintenance, traffic is seamlessly routed to the next best location, without requiring a browser restart, and with a rapid convergence time that’s simply not possible with DNS solutions.

Evaluating CDNs? Look for throughput.

Many of our customers use CacheFly to deliver larger files: videos, apps, games, software downloads; our throughput performance makes it a no brainer to use CacheFly.  What most people don’t realize, is throughput is *as important* for small object/web page delivery.  When researching web performance it’s easy to be convinced that response time or time-to-first-byte (TTFB) is the metric that you need to optimize for.  Those same articles and so-called experts will also tell you it’s extremely important to enable browser side caching, so that your clients don’t have to make a 304 request back to the CDN.

Here’s the thing.. measuring ‘response time’ or TTFB, is simply measuring the performance of 304 responses (headers without content).  These are the very requests you just eliminated with client-side browser caching!

So, if you’re not re-requesting content from the CDN, you want that first request (200 response) to complete as fast as possible.  That’s time-to-last byte (TTLB) – That’s throughput!

 

Start optimizing your static objects for time-to-last-byte and your site will load faster, period.

 

So..Why do people still focus on response time?

First, it’s still a huge factor in loading dynamic, server-generated content where the payload is small and the client spends most of the time waiting for the response to be generated.

However, for large, static content, the TTFB is a small percentage of the overall request – the client spends most of the time actually downloading the object.

Using latency or response time to estimate TTLB/throughput is a pretty good idea – when you don’t have a way to measure throughput.  And for most of the 2000’s, people didn’t have a way to measure this in the real world, so TTFB was as good a metric as any.

However, with the advent of RUM (real user monitoring) measurements from companies like New Relic of page render time, and companies like Cedexis and CloudHarmony actually measuring and reporting on real-world throughput, there’s no reason to use TTFB to makes guesses as to how fast the page will load, you can actually choose the fastest provider based on throughput.

Whether you’re looking for a CDN or are already using one, make sure you’re optimizing for throughput. I encourage you to take advantage of our free test account and experience the CacheFly difference for yourself.

CacheFly Now Using Authy Two-Factor Authentication

To protect our customers against fraudulent data breaches, we have taken account security to the next level by integrating Two-Factor Authentication (2FA).  We evaluated several of the 2FA solutions on the market, and Authy gave us the easiest implementation, a great mobile client, and great customer support. If your organization is looking for a two-factor authentication solution, we highly recommend Authy!

What is two-factor authentication?
Two-factor authentication is the second level of authentication to an account login, where 2-3 types of credentials are requested to provide login access. These include either something you know (e.g. PIN, password, etc.), something you have (e.g. ATM card, token, etc.), or something you are (e.g. fingerprints, voice prints, iris patterns, etc.). With 2FA, the user must correctly enter two of these three factors to successfully login.

To enable two-factor authentication, follow the steps below.  

step1

Step 1: Login

To setup two-factor authentication, simply login as usual, then click on “manage two factor auth.”

Step 2: Settings

Step 2: Settings

Enter your email and mobile phone number to receive a link to download the Authy app.

authy-mobile

Step 3: Retrieve your unique token

After downloading the Authy app onto your phone, it will ask you to register your phone and text you a unique token number.

step4

Step 4: Add your Authy token

Enter your unique token into your account portal and click “finish.” You’re done!

Step 5: Login

Step 5: Login

You will then see a confirmation, stating “This account is protected by two-factor authentication.” Next time you log in, you will be asked for your token code as the second level of authentication.

Are you using two-factor authentication for your business yet? Find out more information on Authy’s two-factor authentication by visiting authy.com.