Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Can I/O latency cause a simple UPDATE to take seconds in MySQL?

Tags:

mysql

vps

disk-io

My MySQL application is experiencing slow performance when running some UPDATE, INSERT and DELETE queries. In this question, I will only discuss one particular UPDATE, because it's enough to demonstrate the problem:

UPDATE projects SET ring = 5 WHERE id = 1

This UPDATE is usually fast enough, around 0.2ms, but every now and then (enough to be a problem) it takes several seconds. Here's an excerpt from the log (look at the 4th line):

 ~ (0.000282) UPDATE `projects` SET `ring` = 5 WHERE `id` = 1
 ~ (0.000214) UPDATE `projects` SET `ring` = 6 WHERE `id` = 1
 ~ (0.000238) UPDATE `projects` SET `ring` = 7 WHERE `id` = 1
 ~ (3.986502) UPDATE `projects` SET `ring` = 8 WHERE `id` = 1
 ~ (0.000186) UPDATE `projects` SET `ring` = 9 WHERE `id` = 1
 ~ (0.000217) UPDATE `projects` SET `ring` = 0 WHERE `id` = 1
 ~ (0.000162) UPDATE `projects` SET `ring` = 1 WHERE `id` = 1

projects is an InnoDB table with 6 columns of types INT and VARCHAR, 17 rows and an index on id. It happens with other tables too, but here I'm focusing on this one. When trying to solve the problem, I ensured that the queries were all sequential, so this is not a lock issue. The UPDATE above is executed in the context of a transaction. Other information on the server:

  • VPS with 4GB RAM (was 1GB), 12GB free disk space
  • CentoOS 5.8 (was 5.7)
  • MySQL 5.5.10 (was 5.0.x)

The "was" bit above means it didn't work before or after the upgrade.

What I've tried so far, to no avail:

  • Setting innodb_flush_log_at_trx_commit to 0, 1 or 2
  • Setting innodb_locks_unsafe_for_binlog on or off
  • Setting timed_mutexes on or off
  • Changing innodb_flush_method from the default to O_DSYNC or O_DIRECT
  • Increasing innodb_buffer_pool_size from the default to 600M and then to 3000M
  • Increasing innodb_log_file_size from the default to 128M
  • Compiling MySQL from source
  • Running SHOW PROCESSLIST, which informs me that the state is "updating"
  • Running SHOW PROFILE ALL, which says that almost all the time was spent on "updating", and that, within that step, not so much time was spent on CPU cycles and there were many voluntary context switches (like 30)
  • Monitoring SHOW STATUS for changes in Innodb_buffer_pool_pages_dirty. There may be some relation between dirty pages being flushed and the slow queries, but the correlation isn't clear.

Then I decided to check the system's I/O latency with ioping. This is my first VPS, so I was surprised to see this result:

4096 bytes from . (vzfs /dev/vzfs): request=1 time=249.2 ms
4096 bytes from . (vzfs /dev/vzfs): request=2 time=12.3 ms
4096 bytes from . (vzfs /dev/vzfs): request=3 time=110.5 ms
4096 bytes from . (vzfs /dev/vzfs): request=4 time=232.8 ms
4096 bytes from . (vzfs /dev/vzfs): request=5 time=294.4 ms
4096 bytes from . (vzfs /dev/vzfs): request=6 time=704.7 ms
4096 bytes from . (vzfs /dev/vzfs): request=7 time=1115.0 ms
4096 bytes from . (vzfs /dev/vzfs): request=8 time=209.7 ms
4096 bytes from . (vzfs /dev/vzfs): request=9 time=64.2 ms
4096 bytes from . (vzfs /dev/vzfs): request=10 time=396.2 ms

Pretty erratic, I would say.

Having said all of that, I ask:

  1. Can the I/O latency be occasionally killing MySQL performance? I always thought that, when you ran an UPDATE, the thread taking care of that connection wasn't going to flush data to disk or wait for such a flush; it would return immediately and the flushing would be done by another thread at another time.

  2. If it can't be disk I/O, is there anything else I can possibly try, short of renting a dedicated server?

like image 998
Adiel Mittmann Avatar asked Mar 11 '12 14:03

Adiel Mittmann


2 Answers

I'm replying to my own question with additional data that I collected based on your answers.

I used two notebooks connected by means of a wireless network. On notebook A, I mounted a directory of notebook B using sshfs. Then on notebook A I started up MySQL specifying that mounted directory as its data directory. This should provide MySQL with a very slow I/O device. MySQL was started with innodb_flush_log_at_trx_commit = 0.

I defined 3 sets of queries, each set consisting of an update and a select query repeated 10,000 times, without explicit transactions. The experiments were:

  • US1SID: update and select on a specific row of the same table. The same row was used in all iterations.
  • US1MID: update and select on a specific row of the same table. The row was a different one in each iteration.
  • US2MID: update and select on rows of different tables. In this case, the table being read by the select didn't change at all during the experiment.

Each set was run twice using a shell script (hence timings are slower than those in my original question), one under normal conditions and the other after executing the following command:

tc qdisc replace dev wlan0 root handle 1:0 netem delay 200ms

The command above adds a mean delay of 200ms when transmitting packets through wlan0.

First, here's the mean time of the top 99% fastest updates and selects, and the bottom 1% updates and selects.

          |        Delay: 0ms        |       Delay: 200ms       |
          | US1SID | US1MID | US2MID | US1SID | US1MID | US2MID |
| top99%u | 0.0064 | 0.0064 | 0.0064 | 0.0063 | 0.0063 | 0.0063 |
| top99%s | 0.0062 | 0.0063 | 0.0063 | 0.0062 | 0.0062 | 0.0062 |
| bot01%u | 1.1834 | 1.2239 | 0.9561 | 1.9461 | 1.7492 | 1.9731 |
| bot01%s | 0.4600 | 0.5391 | 0.3417 | 1.4424 | 1.1557 | 1.6426 |

As is clear, even with really, really poor I/O performance, MySQL manages to execute most queries really fast. But what concerns me the most is the worst cases, so here's another table, showing the 10 slowest queries. A "u" means it was an update, an "s" a select.

|          Delay: 0ms         |          Delay: 200ms          |
| US1SID  | US1MID  | US2MID  | US1SID   | US1MID   | US2MID   |
| 5.443 u | 5.946 u | 5.315 u | 11.500 u | 10.860 u | 11.424 s |
| 5.581 u | 5.954 s | 5.466 u | 11.649 s | 10.995 u | 11.496 s |
| 5.863 s | 6.291 u | 5.658 u | 12.551 s | 11.020 u | 12.221 s |
| 6.192 u | 6.513 u | 5.685 u | 12.893 s | 11.370 s | 12.599 u |
| 6.560 u | 6.521 u | 5.736 u | 13.526 u | 11.387 u | 12.803 u |
| 6.562 u | 6.555 u | 5.743 u | 13.997 s | 11.497 u | 12.920 u |
| 6.872 u | 6.575 u | 5.869 u | 14.662 u | 12.825 u | 13.625 u |
| 6.887 u | 7.908 u | 5.996 u | 19.953 u | 12.860 u | 13.828 s |
| 6.937 u | 8.100 u | 6.330 u | 20.623 u | 14.015 u | 16.292 u |
| 8.665 u | 8.298 u | 6.893 u | 27.102 u | 22.042 s | 17.131 u |

Conclusions:

  1. Poor I/O performance can indeed slow MySQL to a crawl. It's not clear why or when exactly, but it does happen.

  2. The slowing down applies to both selects and updates, with updates suffering more.

  3. For some reason, even selects on a table that wasn't involved in any changes, and which had recently been populated, were also slowed down, as is clear from US2MID above.

  4. As for the test cases proposed by mentatkgs, it seems that updating different rows instead of the same ones does help a little, but doesn't solve the problem.

I guess I will either adapt my software to tolerate such delays or try to move to another provider. Renting a dedicated server is too expensive for this project.

Thank you all for the comments.

like image 125
Adiel Mittmann Avatar answered Sep 29 '22 12:09

Adiel Mittmann


As you are hosting your VPS in the cloud, you might be running into issues that are completely out of your control.

VPSs are subject to the whims of the host servers that run them. For example, CPU cycle priority at the Rackspace Cloud is weighted based on the size of the VPS. The bigger your VPS, the better likelihood that your app will perform smoothly. If there is a bigger VPS on the host you're using, it's possible that there is weighted bursting to blame. It's pretty hard to say.

Have you tried running this locally on your own machine? If it runs perfectly on your own system, and you need guaranteed performance, then your best bet will be to move to a dedicated server.

like image 22
Matt Beckman Avatar answered Sep 29 '22 12:09

Matt Beckman