Matt Levine

Load balancing on JunOS

One of the fun things that tends to happen when I’m coaching someone through an Anycast deployment is getting past the first proof of concept, which usually looks something like: “a west coast webserver” and “east coast webserver” each announcing a /32 via BGP...

One of the fun things that tends to happen when I’m coaching someone through an Anycast deployment is getting past the first proof of concept, which usually looks something like: “a west coast webserver” and “east coast webserver” each announcing a /32 via BGP.

At this point I tend to recommend our approach that we’ve used at CacheFly over the last 15 years or so. We do *not* rely on the machines themselves to handle the BGP injection. We’ve always had a designated machine or machines in each POP that performs health-checks and handles the network magic. The tip I’m going to share, however, works great even if you did decide to use the machine(s) themselves as the injector, but you’d still want *something* health-checking the services locally and calling the up/down scripts, though you could get away with just calling them at boot/shutdown (not recommended!).

Using separate ‘health check machines’ also works well if you have some sort of stateful load balancing equipment (eww) that you want to use behind ECMP.

I’m going to leave out the nuts and bolts of the anycast config itself, and of configuring a health-checking software like Keepalived and focus on the ECMP concept, which once you get the hang of you will find can be useful and easy to deploy. Obviously you’ve already configured your FIB to load-balance per-packet and if you’re on a QFX you’re ideally using enhanced-hash-key ecmp-resilient-hash. Here we go:

For the example I’m going to use, I’ll say we have 3 servers and want to weight them 2:2:1. Let’s say their actually IP’s are on 172.16.1.0/24 and they are:

  • 172.16.1.101
  • 172.16.1.102
  • 172.16.1.103

The basic config looks like configuring a *static* route for your service VIP, with *fake* next-hops, we’ll use 192.168.1.1 as the VIP and 10.1.1.0/24 for our fake next-hops:

routing-options {    
   static {      
     route 192.168.1.1/32 {
        next-hop [ 10.1.1.1 10.1.1.2 10.1.1.3 10.1.1.4 10.1.1.5 ];
        resolve;        
     }     
   }
}

The important part here is resolve. 10.1.1.0/24 is not reachable, and *should not be*.. Use some address space that you’ll never use for any other reason.

(Please note Cisco and other vendors automatically resolves recursive routes. JunOS, by default, will only let you point a static route at a directly connected route — the resolve keyword is what enables recursive lookups on JunOS).

Next you would have a BGP session (or OSPF or IS-IS..I prefer to keep everything simple and keep it all in BGP, but you just need *something* to send routes to your ECMP device), where you have an import policy that accepts 10.1.1.0/24 upto /32’s.

On the machines themselves, you want this in your quagga BGP config:

router bgp 65536
 redistribute kernel route-map LB
 neighbor 172.16.1.1 remote-as 65536
 neighbor 172.16.1.1 prefix-list LB out
!
ip prefix-list LB seq 5 permit 10.1.1.0/24 ge 32
!
route-map LB permit 10
 match ip address prefix-list LB
!
route-map LB deny 20
!

Hopefully you see where we’re going by now.. The final step is to have your “up” and “down” scripts configured on your healthchecker, which just need to do this:

server1 up script:

/sbin/ip route add 10.1.1.1/32 via 172.16.1.101
/sbin/ip route add 10.1.1.2/32 via 172.16.1.101

server 2 up script:

/sbin/ip route add 10.1.1.3/32 via 172.16.1.102
/sbin/ip route add 10.1.1.4/32 via 172.16.1.102

server 3 up script:

/sbin/ip route add 10.1.1.5/32 via 172.16.1.103

And there you have it, now on your Switch/Router you should see:

admin@device> show route 192.168.1.1
inet.0: 699161 destinations, 4883639 routes (699076 active, 0 holddown, 1316436 hidden)
+ = Active Route, - = Last Active, * = Both
192.168.1.1/32 *[Static/5] 
                      to 172.16.1.101 via irb.100
                      to 172.16.1.101 via irb.100
                    > to 172.16.1.102 via irb.100
                      to 172.16.1.102 via irb.100
                      to 172.16.1.103 via irb.100

Voila!

your down scripts would then be the opposite, eg on server1:

/sbin/ip route delete 10.1.1.1/32 via 172.16.1.101
/sbin/ip route delete 10.1.1.2/32 via 172.16.1.101

Which leaves you with:

inet.0: 699161 destinations, 4883639 routes (699076 active, 0 holddown, 1316436 hidden)
+ = Active Route, - = Last Active, * = Both
192.168.1.1/32 *[Static/5] 
                      to 172.16.1.102 via irb.100
                      to 172.16.1.102 via irb.100
                    > to 172.16.1.103 via irb.100

There you have it. You’ve now got a weight-able, health-checkable ECMP load balancing solution.



WHAT OUR CUSTOMERS ARE SAYING

Why do companies of all sizes choose CacheFly?

CacheFly has exceeded our expectations on every level, from the technical operation of the actual service to the top-notch support staff and their responsiveness. I don’t believe our CDN has ever suffered noticeable downtime.

Jason Marlin

Director Of Technology, Ars Technica

CacheFly just works. Our users don’t have to think about how they get our programs; they just do, fast and easily…the show must roll and CacheFly keeps them flowing without a hitch.

Leo Laporte

Founder, Owner & Host, TWiT.TV

Cachefly stood above the competition with their sensible and affordable pricing, clean interface, and straight forward API. And in 10 years, CacheFly has never failed to deliver (both literally and figuratively).

Dan Benjamin

5by5

POWERING THOUSANDS OF CUSTOMERS, INCLUDING

WE’RE READY TO HELP