Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

ionice 'idle' not having the expected effects

We're working with a reasonably busy web server. We wanted to use rsync to do some data-moving which was clearly going to hammer the magnetic disk, so we used ionice to put the rsync process in the idle class. The queues for both disks on the system (SSD+HDD) are set to use the CFQ scheduler. The result... was that the disk was absolutely hammered and the website performance was appalling.

I've done some digging to see if any tuning might help with this. The man page for ionice says:

Idle: A program running with idle I/O priority will only get disk time
when no other program has asked for disk I/O for a defined grace period.
The impact of an idle I/O process on normal system activity should be zero.

This "defined grace period" is not clearly explained anywhere I can find with the help of Google. One posting suggest that it's the value of fifo_expire_async but I can't find any real support for this.

However, on our system, both fifo_expire_async and fifo_expire_sync are set sufficiently long (250ms, 125ms, which are the defaults) that the idle class should actually get NO disk bandwidth at all. Even if the person who believes that the grace period is set by fifo_expire_async is plain wrong, there's not a lot of wiggle-room in the statement "The impact of an idle I/O process on normal system activity should be zero".

Clearly this is not what's happening on our machine so I am wondering if CFQ+idle is simply broken.

Has anyone managed to get it to work? Tips greatly appreciated!

Update: I've done some more testing today. I wrote a small Python app to read random sectors from all over the disk with short sleeps in between. I ran a copy of this without ionice and set it up to perform around 30 reads per second. I then ran a second copy of the app with various ionice classes to see if the idle class did what it said on the box. I saw no difference at all between the results when I used classes 1, 2, 3 (real-time, best-effort, idle). This, despite the fact that I'm now absolutely certain that the disk was busy. Thus, I'm now certain that - at least for our setup - CFQ+idle does not work. [see Update 2 below - it's not so much "does not work" as "does not work as expected"...]

Comments still very welcome!

Update 2: More poking about today. Discovered that when I push the I/O rate up dramatically, the idle-class processes DO in fact start to become starved. In my testing, this happened at I/O rates hugely higher than I had expected - basically hundreds of I/Os per second. I'm still trying to work out what the tuning parameters do...

I also discovered the rather important fact that async disk writes aren't included at all in the I/O prioritisation system! The ionice manpage I quoted above makes no reference to that fact, but the manpage for the syscall ioprio_set() helpfully states:

I/O priorities are supported for reads and for synchronous (O_DIRECT, O_SYNC) writes. I/O priorities are not supported for asynchronous writes because they are issued outside the context of the program dirtying the memory, and thus program-specific priorities do not apply.

This pretty significantly changes the way I was approaching the performance issues and I will be proposing an update for the ionice manpage.

Some more info on kernel and iosched settings (sdb is the HDD):

Linux 4.9.0-4-amd64 #1 SMP Debian 4.9.65-3+deb9u1 (2017-12-23) x86_64 GNU/Linux
/etc/debian_version = 9.3

(cd /sys/block/sdb/queue/iosched; grep . *)
back_seek_max:16384
back_seek_penalty:2
fifo_expire_async:250
fifo_expire_sync:125
group_idle:8
group_idle_us:8000
low_latency:1
quantum:8
slice_async:40
slice_async_rq:2
slice_async_us:40000
slice_idle:8
slice_idle_us:8000
slice_sync:100
slice_sync_us:100000
target_latency:300
target_latency_us:300000
like image 744
Neilski Avatar asked Jan 21 '18 14:01

Neilski


2 Answers

AFAIK, the only opportunity to solve your problem is using CGroup v2 (kernel v. 4.5 or newer). Please see the following article:

https://andrestc.com/post/cgroups-io/

Also please note, that you may use the systemd's wrappers to configure CGroup limits on per-service basis:

http://0pointer.de/blog/projects/resources.html

like image 90
Stanislav Klinkov Avatar answered Nov 04 '22 08:11

Stanislav Klinkov


Add nocache to that and you're set (you can join it with ionice and nice): https://github.com/Feh/nocache

On Ubuntu install with: apt install nocache

It simply omits cache on IO and thanks to that other processes won't starve when the cache is flushed. It's like calling the commands with O_DIRECT, so now you can limit the IO for example with:

systemd-run --scope -q --nice=19 -p BlockIOAccounting=true -p BlockIOWeight=10 -p "BlockIOWriteBandwidth=/dev/sda 10M" nocache youroperation_here

I usually use it with:

nice -n 19 ionice -c 3 nocache youroperation_here
like image 23
Airstriker Avatar answered Nov 04 '22 08:11

Airstriker