Performance Tuning HAProxy
In a recent article, I covered how to tune the NGINX webserver for a simple static HTML page. In this article, we are going to once again explore those performance-tuning concepts and walk through some basic tuning options for HAProxy.
What is HAProxy
HAProxy is a software load balancer commonly used to distribute TCP-based traffic to multiple backend systems. It provides not only load balancing but also has the ability to detect unresponsive backend systems and reroute incoming traffic.
In a traditional IT infrastructure, load balancing is often performed by expensive hardware devices. In cloud and highly distributed infrastructure environments, there is a need to provide this same type of service while maintaining the elastic nature of cloud infrastructure. This is the type of environment where HAProxy shines, and it does so while maintaining a reputation for being extremely efficient out of the box.
Much like NGINX, HAProxy has quite a few parameters set for optimal performance out of the box. However, as with most things, we can still tune it for our specific environment to increase performance.
In this article, we are going to install and configure HAProxy to act as a load balancer for two NGINX instances serving a basic static HTML site. Once set up, we are going to take that configuration and tune it to gain even more performance out of HAProxy.
Installing HAProxy
For our purposes, we will be installing HAProxy on an Ubuntu system. The installation of HAProxy is fairly simple on an Ubuntu system. To accomplish this, we will use the Apt package manager; specifically we will be using the apt-get
command.
# apt-get install haproxy Reading package lists... Done Building dependency tree Reading state information... Done The following additional packages will be installed: liblua5.3-0 Suggested packages: vim-haproxy haproxy-doc The following NEW packages will be installed: haproxy liblua5.3-0 0 upgraded, 2 newly installed, 0 to remove and 0 not upgraded. Need to get 872 kB of archives. After this operation, 1,997 kB of additional disk space will be used. Do you want to continue? [Y/n] y
With the above complete, we now have HAProxy installed. The next step is to configure it to load balance across our backend NGINX instances.
Basic HAProxy Config
In order to set up HAProxy to load balance HTTP traffic across two backend systems, we will first need to modify HAProxy’s default configuration file /etc/haproxy/haproxy.cfg
.
To get started, we will be setting up a basic frontend
service within HAProxy. We will do this by appending the below configuration block.
frontend www bind :80 mode http default_backend bencane.com
Before going too far, let’s break down this configuration a bit to understand what exactly we are telling HAProxy to do.
In this section, we are defining a frontend
service for HAProxy. This is essentially a frontend listener that will accept incoming traffic. The first parameter we define within this section is the bind
parameter. This parameter is used to tell HAProxy what IP and Port to listen on; 0.0.0.0:80
in this case. This means our HAProxy instance will listen for traffic on port 80
and route it through this frontend
service named www
.
Within this section, we are also defining the type of traffic with the mode
parameter. This parameter accepts tcp
or http
options. Since we will be load balancing HTTP traffic, we will use the http
value. The last parameter we are defining is default_backend
, which is used to define the backend
service HAProxy should load balance to. In this case, we will use a value of bencane.com
which will route traffic through our NGINX instances.
backend bencane.com mode http balance roundrobin server nyc2 nyc2.bencane.com:80 check server sfo1 sfo1.bencane.com:80 check
Like the frontend
service, we will also need to define our backend
service by appending the above configuration block to the same /etc/haproxy/haproxy.cfg
file.
In this backend
configuration block, we are defining the systems that HAProxy will load balance traffic to. Like the frontend
section, this section also contains a mode
parameter to define whether these are tcp
or http
backends. For this example, we will once again use http
as our backend systems are a set of NGINX webservers.
In addition to the mode
parameter, this section also has a parameter called balance
. The balance
parameter is used to define the load-balancing algorithm that determines which backend node each request should be sent to. For this initial step, we can simply set this value to roundrobin
, which is used to send traffic evenly as it comes in. This setting is pretty common and often the first load balancer that users start with.
The final parameter in the backend
service is server
, which is used to define the backend system to balance to. In our example, there are two lines that each define a different server. These two servers are the NGINX webservers that we will load balancing traffic to in this example.
The format of the server
line is a bit different than the other parameters. This is because node-specific settings can be configured via the server
parameter. In the example above, we are defining a label
, IP:Port
, and whether or not a health check
should be used to monitor the backend node.
By specifying check
after the web-server’s address, we are defining that HAProxy should perform a health check to determine whether the backend system is responsive or not. If the backend system is not responsive, incoming traffic will not be routed to that backend system.
With the changes above, we now have a basic HAProxy instance configured to load balance an HTTP service. In order for these configurations to take effect however, we will need to restart the HAProxy instance. We can do that with the systemctl
command.
# systemctl restart haproxy
Now that our configuration changes are in place, let’s go ahead and get started with establishing our baseline performance of HAProxy.
Baselining Our Performance
In the “Tuning NGINX for Performance” article, I discussed the importance of establishing a performance baseline before making any changes. By establishing a baseline performance before making any changes, we can identify whether or not the changes we make have a beneficial effect.
As in the previous article, we will be using the ApacheBench tool to measure the performance of our HAProxy instance. In this example however, we will be using the flag -c
to change the number of concurrent HTTP sessions and the flag -n
to specify the number of HTTP requests to make.
# ab -c 2500 -n 5000 -s 90 http://104.131.125.168/ Requests per second: 97.47 [#/sec] (mean) Time per request: 25649.424 [ms] (mean) Time per request: 10.260 [ms] (mean, across all concurrent requests)
After running the ab
(ApacheBench) tool, we can see that out of the box our HAProxy instance is servicing 97.47
HTTP requests per second. This metric will be our baseline measurement; we will be measuring any changes against this metric.
Setting the Maximum Number of Connections
One of the most common tunable parameters for HAProxy is the maxconn
setting. This parameter defines the maximum number of connections the entire HAProxy instance will accept.
When calling the ab
command above, I used the -c
flag to tell ab
to open 2500
concurrent HTTP sessions. By default, the maxconn
parameter is set to 2000
. This means that a default instance of HAProxy will start queuing HTTP sessions once it hits 2000
concurrent sessions. Since our test is launching 2500
sessions, this means that at any given time at least 500
HTTP sessions are being queued while 2000
are being serviced immediately. This certainly should have an effect on our throughput for HAProxy.
Let’s go ahead and raise this limit by once again editing the /etc/haproxy/haproxy.cfg
file.
global maxconn 5000
Within the haproxy.cfg
file, there is a global
section; this section is used to modify “global” parameters for the entire HAProxy instance. By adding the maxconn
setting above, we are increasing the maximum number of connections for the entire HAProxy instance to 5000
, which should be plenty for our testing. In order for this change to take effect, we must once again restart the HAProxy instance using the systemctl
command.
# systemctl restart haproxy
With HAProxy restarted, let’s run our test again.
# ab -c 2500 -n 5000 -s 90 http://104.131.125.168/ Requests per second: 749.22 [#/sec] (mean) Time per request: 3336.786 [ms] (mean) Time per request: 1.335 [ms] (mean, across all concurrent requests)
In our baseline test, the Requests per second
value was 97.47
. After adjusting the maxconn
parameter, the same test returned a Requests per second
of 749.22
. This is a huge improvement over our baseline test and just goes to show how important of a parameter the maxconn
setting is.
When tuning HAProxy, it is very important to understand your target number of concurrent sessions per instance. By identifying and tuning this value upfront, you can save yourself a lot of troubleshooting with HAProxy performance during peak traffic load.
In this article, we set the maxconn
value to 5000
; however this is still a fairly low number for a high-traffic environment. As such, I would highly recommend identifying your desired number of concurrent sessions and tuning the maxconn
parameter before changing any other parameter when tuning HAProxy.
Multiprocessing and CPU Pinning
Another interesting tunable for HAProxy is the nbproc
parameter. By default, HAProxy has a single worker process, which means that all of our HTTP sessions will be load balanced by a single process. With the nbproc
parameter, it is possible to create multiple worker processes to help distribute the workload internally.
While additional worker processes might sound good at first, they only tend to provide value when the server itself has more than 1 CPU. It is not uncommon for environments that create multiple worker processes on single CPU systems to see that HAProxy performs worse than it did as a single process instance. The reason for this is because the overhead of managing multiple worker processes provides a diminishing return when the number of workers exceeds the number of CPUs available.
With this in mind, it is recommended that the nbproc
parameter should be set to match the number of CPUs available to the system. In order to tune this parameter for our environment, we first need to check how many CPUs are available. We can do this by executing the lshw
command.
# lshw -short -class cpu H/W path Device Class Description ============================================ /0/401 processor Intel(R) Xeon(R) CPU E5-2630L v2 @ 2.40GHz /0/402 processor Intel(R) Xeon(R) CPU E5-2630L v2 @ 2.40GHz
From the output above, it appears that we have 2
available CPUs on our HAProxy server. Let’s go ahead and set the nbproc
parameter to 2
, which will tell HAProxy to start a second worker process on restart. We can do this by once again editing the global
section of the /etc/haproxy/haproxy.cfg
file.
global maxconn 5000 nbproc 2 cpu-map 1 0 cpu-map 2 1
In the above HAProxy config example, I included another parameter named cpu-map
. This parameter is used to pin a specific worker process to the specified CPU using CPU affinity. This allows the processes to better distribute the workload across multiple CPUs.
While this might not sound very critical at first, it is when you consider how Linux determines which CPU a process should use when it requires CPU time.
Understanding CPU Affinity
The Linux kernel internally has a concept called CPU affinity, which is where a process is pinned to a specific CPU for its CPU time. If we use our system above as an example, we have two CPUs (0
and 1
), a single threaded HAProxy instance. Without any changes, our single worker process will be pinned to either 0
or 1
.
If we were to enable a second worker process without specifying which CPU that process should have an affinity to, that process would default to the same CPU that the first worker was bound to.
The reason for this is due to how Linux handles CPU affinity of child processes. Unless told otherwise, a child process is always bound to the same CPU as the parent process in Linux. The reason for this is to allow processes to leverage the L1 and L2 caches available on the physical CPU. In most cases, this makes an application perform faster.
The downside to this can be seen in our example. If we enable two workers and both worker1 and worker2 were bound to CPU 0
, the workers would constantly be competing for the same CPU time. By pinning the worker processes on different CPUs, we are able to better utilize all of our CPU time available to our system and reduce the amount of times our worker processes are waiting for CPU time.
In the configuration above, we are using cpu-map
to define CPU affinity by pinning worker1 to CPU 0
and worker2 to CPU 1
.
After making these changes, we can restart the HAProxy instance again and retest with the ab
tool to see some significant improvements in performance.
# systemctl restart haproxy
With HAProxy restarted, let’s go ahead and rerun our test with the ab
command.
# ab -c 2500 -n 5000 -s 90 http://104.131.125.168/ Requests per second: 1185.97 [#/sec] (mean) Time per request: 2302.093 [ms] (mean) Time per request: 0.921 [ms] (mean, across all concurrent requests)
In our previous test run, we were able to get a Requests per second
of 749.22
. With this latest run, after increasing the number of worker processes, we were able to push the Requests per second
to 1185.97
, a sizable improvement.
Adjusting the Load Balancing Algorithm
The final adjustment we will make is not a traditional tuning parameter, but it still has an importance in the amount of HTTP sessions our HAProxy instance can process. The adjustment is the load balancing algorithm we have specified.
Earlier in this post, we specified the load balancing algorithm of roundrobin
in our backend
service. In this next step, we will be changing the balance
parameter to static-rr
by once again editing the /etc/haproxy/haproxy.cfg
file.
backend bencane.com mode http balance static-rr server nyc2 nyc2.bencane.com:80 check server sfo1 sfo1.bencane.com:80 check
The static-rr
algorithm is a round robin algorithm very similar to the roundrobin
algorithm, with the exception that it does not support dynamic weighting. This weighting mechanism allows HAProxy to select a preferred backend over others. Since static-rr
doesn’t worry about dynamic weighting, it is slightly more efficient than the roundrobin
algorithm (approximately 1 percent more efficient).
Let’s go ahead and test the impact of this change by restarting the HAProxy instance again and executing another ab
test run.
# systemctl restart haproxy
With the service restarted, let’s go ahead and rerun our test.
# ab -c 2500 -n 5000 -s 90 http://104.131.125.168/ Requests per second: 1460.29 [#/sec] (mean) Time per request: 1711.993 [ms] (mean) Time per request: 0.685 [ms] (mean, across all concurrent requests)
In this final test, we were able to increase our Requests per second
metric to 1460.29
, a sizable difference over the 1185.97
results from the previous run.
Summary
In the beginning of this article, our basic HAProxy instance was only able to service 97
HTTP requests per second. After increasing a maximum number of connections parameter, increasing the number of worker processes, and changing our load balancing algorithm, we were able to push our HAProxy instance to 1460
HTTP requests per second; an improvement of 1405 percent.
Even with such an increase in performance, there are still more tuning parameters available within HAProxy. While this article covered a few basic and unconventional parameters, we have still only scratched the surface of tuning HAProxy. For more tuning options, you can checkout HAProxy’s configuration guide.
Reference: | Performance Tuning HAProxy from our SCG partner Ben Cane at the Codeship Blog blog. |