Performance Tuning HAProxy
In a recent article, I covered how to tune the NGINX webserver for a simple static HTML page. In this article, we are going to once again explore those performance-tuning concepts and walk through some basic tuning options for HAProxy.
What is HAProxy
HAProxy is a software load balancer commonly used to distribute TCP-based traffic to multiple backend systems. It provides not only load balancing but also has the ability to detect unresponsive backend systems and reroute incoming traffic.
In a traditional IT infrastructure, load balancing is often performed by expensive hardware devices. In cloud and highly distributed infrastructure environments, there is a need to provide this same type of service while maintaining the elastic nature of cloud infrastructure. This is the type of environment where HAProxy shines, and it does so while maintaining a reputation for being extremely efficient out of the box.
Much like NGINX, HAProxy has quite a few parameters set for optimal performance out of the box. However, as with most things, we can still tune it for our specific environment to increase performance.
In this article, we are going to install and configure HAProxy to act as a load balancer for two NGINX instances serving a basic static HTML site. Once set up, we are going to take that configuration and tune it to gain even more performance out of HAProxy.
For our purposes, we will be installing HAProxy on an Ubuntu system. The installation of HAProxy is fairly simple on an Ubuntu system. To accomplish this, we will use the Apt package manager; specifically we will be using the
# apt-get install haproxy Reading package lists... Done Building dependency tree Reading state information... Done The following additional packages will be installed: liblua5.3-0 Suggested packages: vim-haproxy haproxy-doc The following NEW packages will be installed: haproxy liblua5.3-0 0 upgraded, 2 newly installed, 0 to remove and 0 not upgraded. Need to get 872 kB of archives. After this operation, 1,997 kB of additional disk space will be used. Do you want to continue? [Y/n] y
With the above complete, we now have HAProxy installed. The next step is to configure it to load balance across our backend NGINX instances.
Basic HAProxy Config
In order to set up HAProxy to load balance HTTP traffic across two backend systems, we will first need to modify HAProxy’s default configuration file
To get started, we will be setting up a basic
frontend service within HAProxy. We will do this by appending the below configuration block.
frontend www bind :80 mode http default_backend bencane.com
Before going too far, let’s break down this configuration a bit to understand what exactly we are telling HAProxy to do.
In this section, we are defining a
frontend service for HAProxy. This is essentially a frontend listener that will accept incoming traffic. The first parameter we define within this section is the
bind parameter. This parameter is used to tell HAProxy what IP and Port to listen on;
0.0.0.0:80 in this case. This means our HAProxy instance will listen for traffic on port
80 and route it through this
frontend service named
Within this section, we are also defining the type of traffic with the
mode parameter. This parameter accepts
http options. Since we will be load balancing HTTP traffic, we will use the
http value. The last parameter we are defining is
default_backend, which is used to define the
backend service HAProxy should load balance to. In this case, we will use a value of
bencane.com which will route traffic through our NGINX instances.
backend bencane.com mode http balance roundrobin server nyc2 nyc2.bencane.com:80 check server sfo1 sfo1.bencane.com:80 check
frontend service, we will also need to define our
backend service by appending the above configuration block to the same
backend configuration block, we are defining the systems that HAProxy will load balance traffic to. Like the
frontend section, this section also contains a
mode parameter to define whether these are
http backends. For this example, we will once again use
http as our backend systems are a set of NGINX webservers.
In addition to the
mode parameter, this section also has a parameter called
balance parameter is used to define the load-balancing algorithm that determines which backend node each request should be sent to. For this initial step, we can simply set this value to
roundrobin, which is used to send traffic evenly as it comes in. This setting is pretty common and often the first load balancer that users start with.
The final parameter in the
backend service is
server, which is used to define the backend system to balance to. In our example, there are two lines that each define a different server. These two servers are the NGINX webservers that we will load balancing traffic to in this example.
The format of the
server line is a bit different than the other parameters. This is because node-specific settings can be configured via the
server parameter. In the example above, we are defining a
IP:Port, and whether or not a health
check should be used to monitor the backend node.
check after the web-server’s address, we are defining that HAProxy should perform a health check to determine whether the backend system is responsive or not. If the backend system is not responsive, incoming traffic will not be routed to that backend system.
With the changes above, we now have a basic HAProxy instance configured to load balance an HTTP service. In order for these configurations to take effect however, we will need to restart the HAProxy instance. We can do that with the
# systemctl restart haproxy
Now that our configuration changes are in place, let’s go ahead and get started with establishing our baseline performance of HAProxy.
Baselining Our Performance
In the “Tuning NGINX for Performance” article, I discussed the importance of establishing a performance baseline before making any changes. By establishing a baseline performance before making any changes, we can identify whether or not the changes we make have a beneficial effect.
As in the previous article, we will be using the ApacheBench tool to measure the performance of our HAProxy instance. In this example however, we will be using the flag
-c to change the number of concurrent HTTP sessions and the flag
-n to specify the number of HTTP requests to make.
# ab -c 2500 -n 5000 -s 90 http://22.214.171.124/ Requests per second: 97.47 [#/sec] (mean) Time per request: 25649.424 [ms] (mean) Time per request: 10.260 [ms] (mean, across all concurrent requests)
After running the
ab (ApacheBench) tool, we can see that out of the box our HAProxy instance is servicing
97.47 HTTP requests per second. This metric will be our baseline measurement; we will be measuring any changes against this metric.
Setting the Maximum Number of Connections
One of the most common tunable parameters for HAProxy is the
maxconn setting. This parameter defines the maximum number of connections the entire HAProxy instance will accept.
When calling the
ab command above, I used the
-c flag to tell
ab to open
2500 concurrent HTTP sessions. By default, the
maxconn parameter is set to
2000. This means that a default instance of HAProxy will start queuing HTTP sessions once it hits
2000 concurrent sessions. Since our test is launching
2500 sessions, this means that at any given time at least
500 HTTP sessions are being queued while
2000 are being serviced immediately. This certainly should have an effect on our throughput for HAProxy.
Let’s go ahead and raise this limit by once again editing the
global maxconn 5000
haproxy.cfg file, there is a
global section; this section is used to modify “global” parameters for the entire HAProxy instance. By adding the
maxconn setting above, we are increasing the maximum number of connections for the entire HAProxy instance to
5000, which should be plenty for our testing. In order for this change to take effect, we must once again restart the HAProxy instance using the
# systemctl restart haproxy
With HAProxy restarted, let’s run our test again.
# ab -c 2500 -n 5000 -s 90 http://126.96.36.199/ Requests per second: 749.22 [#/sec] (mean) Time per request: 3336.786 [ms] (mean) Time per request: 1.335 [ms] (mean, across all concurrent requests)
In our baseline test, the
Requests per second value was
97.47. After adjusting the
maxconn parameter, the same test returned a
Requests per second of
749.22. This is a huge improvement over our baseline test and just goes to show how important of a parameter the
maxconn setting is.
When tuning HAProxy, it is very important to understand your target number of concurrent sessions per instance. By identifying and tuning this value upfront, you can save yourself a lot of troubleshooting with HAProxy performance during peak traffic load.
In this article, we set the
maxconn value to
5000; however this is still a fairly low number for a high-traffic environment. As such, I would highly recommend identifying your desired number of concurrent sessions and tuning the
maxconn parameter before changing any other parameter when tuning HAProxy.
Multiprocessing and CPU Pinning
Another interesting tunable for HAProxy is the
nbproc parameter. By default, HAProxy has a single worker process, which means that all of our HTTP sessions will be load balanced by a single process. With the
nbproc parameter, it is possible to create multiple worker processes to help distribute the workload internally.
While additional worker processes might sound good at first, they only tend to provide value when the server itself has more than 1 CPU. It is not uncommon for environments that create multiple worker processes on single CPU systems to see that HAProxy performs worse than it did as a single process instance. The reason for this is because the overhead of managing multiple worker processes provides a diminishing return when the number of workers exceeds the number of CPUs available.
With this in mind, it is recommended that the
nbproc parameter should be set to match the number of CPUs available to the system. In order to tune this parameter for our environment, we first need to check how many CPUs are available. We can do this by executing the
# lshw -short -class cpu H/W path Device Class Description ============================================ /0/401 processor Intel(R) Xeon(R) CPU E5-2630L v2 @ 2.40GHz /0/402 processor Intel(R) Xeon(R) CPU E5-2630L v2 @ 2.40GHz
From the output above, it appears that we have
2 available CPUs on our HAProxy server. Let’s go ahead and set the
nbproc parameter to
2, which will tell HAProxy to start a second worker process on restart. We can do this by once again editing the
global section of the
global maxconn 5000 nbproc 2 cpu-map 1 0 cpu-map 2 1
In the above HAProxy config example, I included another parameter named
cpu-map. This parameter is used to pin a specific worker process to the specified CPU using CPU affinity. This allows the processes to better distribute the workload across multiple CPUs.
While this might not sound very critical at first, it is when you consider how Linux determines which CPU a process should use when it requires CPU time.
Understanding CPU Affinity
The Linux kernel internally has a concept called CPU affinity, which is where a process is pinned to a specific CPU for its CPU time. If we use our system above as an example, we have two CPUs (
1), a single threaded HAProxy instance. Without any changes, our single worker process will be pinned to either
If we were to enable a second worker process without specifying which CPU that process should have an affinity to, that process would default to the same CPU that the first worker was bound to.
The reason for this is due to how Linux handles CPU affinity of child processes. Unless told otherwise, a child process is always bound to the same CPU as the parent process in Linux. The reason for this is to allow processes to leverage the L1 and L2 caches available on the physical CPU. In most cases, this makes an application perform faster.
The downside to this can be seen in our example. If we enable two workers and both worker1 and worker2 were bound to CPU
0, the workers would constantly be competing for the same CPU time. By pinning the worker processes on different CPUs, we are able to better utilize all of our CPU time available to our system and reduce the amount of times our worker processes are waiting for CPU time.
In the configuration above, we are using
cpu-map to define CPU affinity by pinning worker1 to CPU
0 and worker2 to CPU
After making these changes, we can restart the HAProxy instance again and retest with the
ab tool to see some significant improvements in performance.
# systemctl restart haproxy
With HAProxy restarted, let’s go ahead and rerun our test with the
# ab -c 2500 -n 5000 -s 90 http://188.8.131.52/ Requests per second: 1185.97 [#/sec] (mean) Time per request: 2302.093 [ms] (mean) Time per request: 0.921 [ms] (mean, across all concurrent requests)
In our previous test run, we were able to get a
Requests per second of
749.22. With this latest run, after increasing the number of worker processes, we were able to push the
Requests per second to
1185.97, a sizable improvement.
Adjusting the Load Balancing Algorithm
The final adjustment we will make is not a traditional tuning parameter, but it still has an importance in the amount of HTTP sessions our HAProxy instance can process. The adjustment is the load balancing algorithm we have specified.
Earlier in this post, we specified the load balancing algorithm of
roundrobin in our
backend service. In this next step, we will be changing the
balance parameter to
static-rr by once again editing the
backend bencane.com mode http balance static-rr server nyc2 nyc2.bencane.com:80 check server sfo1 sfo1.bencane.com:80 check
static-rr algorithm is a round robin algorithm very similar to the
roundrobin algorithm, with the exception that it does not support dynamic weighting. This weighting mechanism allows HAProxy to select a preferred backend over others. Since
static-rr doesn’t worry about dynamic weighting, it is slightly more efficient than the
roundrobin algorithm (approximately 1 percent more efficient).
Let’s go ahead and test the impact of this change by restarting the HAProxy instance again and executing another
ab test run.
# systemctl restart haproxy
With the service restarted, let’s go ahead and rerun our test.
# ab -c 2500 -n 5000 -s 90 http://184.108.40.206/ Requests per second: 1460.29 [#/sec] (mean) Time per request: 1711.993 [ms] (mean) Time per request: 0.685 [ms] (mean, across all concurrent requests)
In this final test, we were able to increase our
Requests per second metric to
1460.29, a sizable difference over the
1185.97 results from the previous run.
In the beginning of this article, our basic HAProxy instance was only able to service
97 HTTP requests per second. After increasing a maximum number of connections parameter, increasing the number of worker processes, and changing our load balancing algorithm, we were able to push our HAProxy instance to
1460 HTTP requests per second; an improvement of 1405 percent.
Even with such an increase in performance, there are still more tuning parameters available within HAProxy. While this article covered a few basic and unconventional parameters, we have still only scratched the surface of tuning HAProxy. For more tuning options, you can checkout HAProxy’s configuration guide.
|Reference:||Performance Tuning HAProxy from our SCG partner Ben Cane at the Codeship Blog blog.|