Improving Linux System Performance with I/O Scheduler Tuning
In a previous article, I wrote about using pgbench to tune PostgreSQL. While I covered a very common tunable shared_buffers
, there are many other tuning options that can be used to gain performance from PostgreSQL.
Today’s article is going to cover one of those additional options. However, this tunable does not belong to PostgreSQL. Rather this tunable belongs to the Linux Kernel.
In today’s article, we will be adjusting the Linux I/O scheduler and measuring the impact of those changes with pgbench
. We will do this using the same PostgreSQL environment we used in the previous article. All of the tuning parameters from the previous article have already been applied to the environment we will be using today.
What Is an I/O Scheduler?
The I/O Scheduler is an interesting subject; it’s something that’s rarely thought about unless you are trying to get the best performance out of your Linux systems. Before going too deep into how to change the I/O scheduler, let’s take a moment to better familiarize ourselves with what I/O schedulers provide.
Disk access has always been considered the slowest method of accessing data. Even with the growing popularity of Flash and Solid State storage, accessing data from disk is considered slower when compared to accessing data from RAM. This is especially true when you have infrastructure that is using spinning disks.
The reason for this is because traditional spinning disks write data based on locations on a spinning platter. When reading data from a spinning disk it is necessary for the physical drive to spin the disk platters to a specific location to read the data. This process is known as “seeking” and in terms of computing, this process can take a long time.
I/O schedulers exist as a way to optimize disk access requests. They traditionally do this by merging I/O requests to similar locations on disk. By grouping requests located at similar sections of disk, the drive doesn’t need to “seek” as often, improving the overall response time for disk operations.
On modern Linux implementations, there are several I/O scheduler options available. Each of these have their own unique method of scheduling disk access requests. In the rest of this article, we will break down how each of these schedulers prioritizes disk access and measure the performance changes from scheduler to scheduler.
Changing the I/O Scheduler
For today’s article, we will be using an Ubuntu Linux server for our tests. With Ubuntu, changing the I/O Scheduler can be performed at both runtime and on bootup. The method for changing the scheduler at runtime is as simple as changing the value of a file located within /sys
. Changing the value on bootup, which allows you to maintain the setting across reboots, will involve changing the Kernel parameters passed via the Grub boot loader.
Before we change the I/O scheduler however, let’s first identify our current I/O scheduler. This can be accomplished by reading the /sys/block/<disk device>/queue/scheduler
file.
# cat /sys/block/sda/queue/scheduler noop [deadline] cfq
The above shows that the I/O scheduler for disk sda
is currently set to deadline
.
One important item to remember is that I/O scheduling methods are defined at the Linux Kernel level, but they are applied on each disk device separately. If we were to change the value in the file above, this would mean that all filesystems on disk device sda
will use the new I/O scheduler.
As with anything performance-tuning related, it is important to understand what types of workloads exist for the environment being tuned. Each I/O scheduler has a unique way to prioritize disk operations. Understanding the workload required makes it easier to select the right scheduler.
However, like any other performance-tuning change, it is always best to test multiple options and choose based on the results. This is exactly what we will be doing in this article.
Runtime modification of I/O scheduler
As I mentioned earlier, there are two ways to change the I/O scheduler. You can change the scheduler at runtime, which is applied immediately to a running system, or we can modify the Grub boot loader’s configuration to apply the scheduler on boot.
Since we will be performing benchmark tests to evaluate which scheduler provides the best results for our PostgreSQL instance, we will start off by changing the scheduler at runtime.
To accomplish this, we simply need to overwrite the /sys/block/<disk device>/queue/scheduler
file with the new I/O scheduler selection.
# echo "cfq" > /sys/block/sda/queue/scheduler # cat /sys/block/sda/queue/scheduler noop deadline [cfq]
From the above, we can see that echo
ing cfq
to the /sys/block/sda/queue/scheduler
file changed our current I/O scheduler to CFQ. This change takes effect immediately. This means we can start testing the scheduler performance without having to restart PostgreSQL or any other service.
Testing PostgreSQL & I/O Scheduler Performance
Since we have already changed the I/O scheduler to CFQ, we will go ahead and start our testing with the CFQ I/O scheduler.
CFQ
The Complete Fairness Queueing (CFQ) I/O scheduler works by creating a per-process I/O queue. The goal of this I/O scheduler is to provide a fair I/O priority to each process. While the CFQ algorithm is complex, the gist of this scheduler is that after ordering the queues to reduce disk seeking, it services these per-process I/O queues in a round-robin fashion.
What this means for performance is that the CFQ scheduler tries to provide each process with the same priority for disk access. However, in doing so it makes this scheduler less optimal for environments that might need to prioritize one request type (such as reads) from a single process.
With that understanding of the CFQ scheduler, let’s go ahead and establish a benchmark performance metric for our PostgreSQL database instance with pgbench
.
# su - postgres
In order to run pgbench
, we first need to switch to the postgres
user. Once there, we can execute the same pgbench
command executed in our previous article.
$ pgbench -c 100 -j 2 -t 1000 example starting vacuum...end. transaction type: TPC-B (sort of) scaling factor: 50 query mode: simple number of clients: 100 number of threads: 2 number of transactions per client: 1000 number of transactions actually processed: 100000/100000 latency average: 60.823 ms tps = 1644.104024 (including connections establishing) tps = 1644.228715 (excluding connections establishing)
From the above, we can see that our tps reached roughly 1,644
transactions per second. While not a bad start, this is not the fastest scheduler for this workload.
Deadline
The Deadline scheduler works by creating two queues: a read queue and a write queue. Each I/O request has a time stamp associated that is used by the kernel for an expiration time.
While this scheduler also attempts to service the queues based on the most efficient ordering possible, the timeout acts as a “deadline” for each I/O request. When an I/O request reaches its deadline, it is pushed to the highest priority.
While tunable, the default “deadline” values are 500
ms for Read operations and 5,000
ms for Write operations. Based on these values, we can see why the Deadline scheduler is considered an optimal scheduler for read-heavy workloads. With these timeout values, the Deadline scheduler may prioritize reads more than writes.
Now that we understand the Deadline scheduler a bit better, let’s go ahead and change to the Deadline scheduler and see how it holds up to our pgbench
testing.
# echo deadline > /sys/block/sda/queue/scheduler # cat /sys/block/sda/queue/scheduler noop [deadline] cfq
With the above, we can see that our I/O scheduler is now the Deadline scheduler. Let’s go ahead and run our pgbench
test again.
# su - postgres $ pgbench -c 100 -j 2 -t 1000 example starting vacuum...end. transaction type: TPC-B (sort of) scaling factor: 50 query mode: simple number of clients: 100 number of threads: 2 number of transactions per client: 1000 number of transactions actually processed: 100000/100000 latency average: 46.700 ms tps = 2141.318132 (including connections establishing) tps = 2141.489076 (excluding connections establishing)
This time it seems that pgbench
was able to reach 2,141
transactions per second. This is a 500
transactions-per-second increase, a pretty sizable increase.
What this tells us is that even though pgbench
is creating a database workload that is both read and write heavy, the overall PostgreSQL instance benefits from a read-priority-based I/O scheduler.
Noop
The Noop scheduler is a unique scheduler. Rather than prioritizing specific I/O operations, it simply places all I/O requests into a FIFO (First in, First Out) queue. While this scheduler does try to merge similar requests, that is the extent of the complexity of this scheduler.
This scheduler is optimized for systems that essentially do not need an I/O scheduler. This scheduler can be used in numerous scenarios such as environments where the underlying disk infrastructure is performing I/O scheduling on Virtual Machines.
Since a VM is running within a Host Server/OS, that host already may have an I/O scheduler in use. In this scenario, each disk operation is passing through two I/O schedulers: one for the VM and one for the VM Host.
Let’s take a look at what kind of performance Noop has in our environment.
# echo noop > /sys/block/sda/queue/scheduler # cat /sys/block/sda/queue/scheduler [noop] deadline cfq
With the above, the scheduler has been changed to the Noop scheduler. We can now run pgbench
to measure the impact of this I/O scheduler.
# su - postgres $ pgbench -c 100 -j 2 -t 1000 example starting vacuum...end. transaction type: TPC-B (sort of) scaling factor: 50 query mode: simple number of clients: 100 number of threads: 2 number of transactions per client: 1000 number of transactions actually processed: 100000/100000 latency average: 46.364 ms tps = 2156.838618 (including connections establishing) tps = 2157.102989 (excluding connections establishing)
From the above, we can see that we were able to reach 2,156
transactions per second. This is only a slightly better performance over the Deadline scheduler. One of the reasons this scheduler may have better performance in our case is because the environment we are testing with is hosted within a VM.
This means that regardless of the changes being made within the VM, the I/O scheduler in use on the VM host will stay the same.
Changing the Scheduler on Boot
Since the Noop scheduler provided quite a bit of improvement over the CFQ scheduler, let’s go ahead and make that change permanent. To do this, we will need to edit the /etc/default/grub
configuration file.
# vi /etc/default/grub
The /etc/default/grub
configuration file is used to configure the Grub boot loader. In this case, we will be looking for an option named GRUB_CMDLINE_LINUX
. This option is used to add kernel boot parameters on startup.
The parameter we need to add is the elevator
parameter. This is used to specify the desired I/O scheduler. Let’s go ahead and add the parameter specifying the Noop scheduler.
GRUB_CMDLINE_LINUX="elevator=noop"
In the above, we added elevator=noop
. This is used to define that the I/O scheduler on boot should be the Noop I/O scheduler. Once the changes have been made, we will need to run the update-grub2
command to apply the changed configurations.
# update-grub2 Generating grub configuration file ... Found linux image: /boot/vmlinuz-4.4.0-62-generic Found initrd image: /boot/initrd.img-4.4.0-62-generic Found linux image: /boot/vmlinuz-4.4.0-57-generic Found initrd image: /boot/initrd.img-4.4.0-57-generic done
With the grub configurations applied, we can now reboot
the system and validate that the changes are still in effect.
# cat /sys/block/sda/queue/scheduler [noop] deadline cfq
Summary
In this article, we learned about the various I/O schedulers available on a typical Ubuntu Linux system. We also used pgbench
to explore the effects these I/O schedulers have on our PostgreSQL instance.
While our testing showed the Noop scheduler was the most performant for our environment, each environment is different. The type of service being executed and the use of that service can change the performance profile of an environment greatly.
Reference: | Improving Linux System Performance with I/O Scheduler Tuning from our SCG partner Ben Cane at the Codeship Blog blog. |
Nice article! Thanks for sharing
Thanks!
But one question: How do you change the I/O scheduler on boot only for one specific disk device?
GRUB_CMDLINE_LINUX=”elevator=noop” changes the scheduler for all disk devices, which isn’t optimal in many cases.