This is just going to be a quick overview of some common methods of load balancing your web services via software load balancers. This post will have several parts, each covering different software load balancers. This particular part we will be covering Perlbal.
In this post I’ll be using two different VPS accounts as an example since we will need to spread the load across something. I’ll be referencing them as VPS1 and VPS2. In the examples they will be running Ubuntu, though most of the tutorial can be applied to any distribution. One thing to note is I’ll be using Aptitude for package management which is fairly specific to Debian based distributions. Usually Yum will be your alternative – and in the worse case scenario you will have to compile the packages manually.
What is Perlbal?
Perlbal is a single-threaded event-based server supporting HTTP load balancing, web serving, and a mix of the two.
Perlbal is a software load balancer created by Danga Interactive for the use on LiveJournal, Typepad, and other ventures. It’s a lightweight reverse proxy that has the ability to serve as a web server, though that is out of the scope of this post.
What is a “reverse proxy”?
According to Wikipedia a reverse proxy is:
A reverse proxy or surrogate is a proxy server that is installed in a server network. Typically, reverse proxies are used in front of Web servers. All connections coming from the Internet addressed to one of the Web servers are routed through the proxy server, which may either deal with the request itself or pass the request wholly or partially to the main web servers.
I believe this description is very fitting – it’s essentially a “middle man” for a request (in this case HTTP request) that decides where it should be routed to. This could mean simply deciding to put it on VPS1, or VPS2 – or it could mean if the HTTP request is for an image we’ll send it to Lighty, and if it’s a normal request to a PHP page we’ll send it to Apache. Think of it as a “router” in the literal sense.
Why is it useful?
I’ll give two simple scenario’s to try and explain the useflness of reverse proxies:
You’re serving 100 requests/second and are running out of memory on your server – you notice that a good 80% of the requests are simply for static files, such as images and stylesheets – unfortunately Apache still takes a large amount of memory / CPU for these requests causing you to exhaust your resources. One way you could reduce the usage is simply setting up an alternative web server such as Lighty that has a lighter footprint to serve your static files, and keep Apache for the requests serving your application.
We’ll also use the example of serving 100 requests/second – but let’s just say your server simply can’t handle the traffic. You could setup another server identical to your current one, and have the reverse proxy listen for incoming requests and it will direct the traffic to both servers seemlessly – allowing you to split the resources across the board and maintaining a fast / stable website.
These examples may have some flaws, though they’re simply here to illustrate common uses / the benefits of using a reverse proxy (or software load balancer).
First things first
First you need to find where your bottleneck is in your application. Is it the web server or is it the database server? Could it simply be slow SQL queries that could be fixed by adding a index? Ultimately you want to make sure that your application is optimized as much as possible – and make sure the bottleneck can’t simply be fixed by a few tweaks.
How do I setup the environment described in Scenario 1?
In this setup we’re going to use Apache to serve all HTTP requests relating to PHP – or your application, and use Lighty to serve static files such as images. The goal here is to use Lighty because of its low footprint to serve static files such as images and to lower the overall usage of your server.
1) Getting the packages
You will need the following packages:
- Apache (setup for PHP)
In Ubuntu each of these should be in the repositories, so you can install them with via Aptitude:
aptitude install apache2 lighttpd perlbal -y
I believe most package repositories have Apache and Lighttpd, though Perlbal could be a hit or miss. Luckily Perlbal is in CPAN so you can install it fairly easily:
perl -MCPAN -e ‘install perlbal’
2) Setting up Apache, Lighttpd
Once you’ve installed the packages you will need to configure both of the webservers to listen on a port other than port 80. The reason for this is Perlbal will be listening on port 80 and deciding what to do with the requests. In this example I will have Apache listen on port 8080, and Lighty listening on port 8181:
It’s up to you to configure Apache completely, the point here is to simply have it listen on port 8080 instead of 80.
server.port = 8181
Once again it’s up to you to configure Lighty, we’re just having it listen on port 8181 instead of 80.
3) Setting up Perlbal
Now we have to setup Perlbal to handle the request. What we are going to do is have every request that is going to the subdomain “static.domain.com” be served via Lighttpd, and everything else via Apache. A quick note before we begin – Perlbal is capable of serving files, so Lighttpd is not required. I’m simply using it to illustrate how one would go about doing it.
CREATE SERVICE select
SET listen = 127.0.0.1:80
SET roles = selector
SET plugins = vhosts
VHOST *.domain.com = apache_server
VHOST static.domain.com = lightttpd_server
CREATE POOL apache_server
POOL apache_server ADD 127.0.0.1:8080
CREATE POOL lighttpd_server
POOL lighttpd_server ADD 127.0.0.1:8181
In a nutshell this configuration does three important things – it tells Perlbal to listen on port 80 so it can accept the normal HTTP requests, it then says any request going to “*.domain.com” will be sent to our Apache instance, and anything going to “static.domain.com” go to our Lighty instance. For more information on Perlbals configuration please check out their documentation & mailing list.
4) Start everything up
Now we simply need to start up our web servers and Perlbal accordingly:
5) See if it works
Last but not least we need to make sure everything works. Simply visit “domain.com” and make sure it loads accordingly, then visit “static.domain.com” and verify that loads. If both load fine it should be working – if you want to make sure you can take a peak in the logs to see incoming requests (/var/log/apache2/* , /var/log/lighttpd/*).
How do I setup the environment described in Scenario2?
In this setup we’re simply going to split the requests across two VPS servers. One of the VPS instances will house the Perlbal instance as well as the web server, and the other VPS will simpy house another web server.
1) Getting the packages
I’m going to assume you already have a web server of your choice setup on both VPS1 and VPS2. All you will need to download is Perlbal:
aptitude install perlbal -y
or installing Perlbal via CPAN:
perl -MCPAN -e ‘install perlbal’
2) Configuring the web server
The only configuration required on the web server is simply changing what port it is listening on. Keep in mind you only have to do this if Perlbal is sharing the same system as a web server – if you’ve decided to give Perlbal its own environment this is not necessary. In these examples I’ll assume you changed the web servers to listen on port 8080.
3) Configuring Perlbal
Now we have to setup Perlbal to listen on port 80 and direct the requests among VPS1 and VPS2
CREATE POOL web_servers
POOL web_servers ADD 126.96.36.199:8080
POOL web_servers ADD 188.8.131.52:8080
CREATE SERVICE balancer
SET listen = 0.0.0.0:80
SET role = reverse_proxy
SET pool = web_servers
SET persist_client = on
SET persist_backend = on
SET verify_backend = on
SET balance_method = random
Though the configuration is fairly self explanatory – we’re creating a “pool” and adding our web servers to it. We then create a “service” for Perlbal saying we want it to listen on port 80 and act as a reverse proxy. We also tell it to use the pool we just created. The other options you don’t need to worry about too much – though the “balance_method” option you have two choices: random and round-robin. The differences between them should be fairly obvious – random will choose a random server out of the pool while round-robin will go through each of the servers in the pool in a more orderly manner.
4) Start Perlbal
Now all you have to do is start perlbal:
5) See if it works
Simply visit your domain and see if the page loads – you can refresh a few times to make sure that it’s hitting both servers and both servers are responding accordingly. Once again you may want to check your web servers logs to verify it’s getting the requests.
This post is meant to be only a primer and not used in a production environment. I understand there are some principles I either skimmed over or completely omitted. This post was done solely off of memory and past experiences – so please be careful if you try to use a setup similar to the ones outlined in this post. If you come across any errors please let me know so I can fix them.