Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to get more Disk performance in Google Cloud?

One of the volumes for one of our (Ubuntu 16.04) Google Cloud VM's is at 100% disk utilization pretty much all the time - here is a 10 second sample plucked at random from the system:

iostat -x 10

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sdd               0.60    17.20 5450.50 2468.00 148923.60 25490.00    44.05    11.81    1.49    1.13    2.29   0.13  99.60

This is currently a 2.5T persistent SSD.

My understanding is I can't get better performance by adding virtual "spindles" and then distributing the workload across them.

This is a database volume, so I can't really use volatile SSD either.

I have currently XFS on it with these mount options:

type xfs (rw,noatime,nodiratime,attr2,nobarrier,inode64,noquota)

Any suggestions?

like image 313
rotten Avatar asked Oct 27 '17 13:10

rotten


People also ask

What can cause disk latency?

When there is a queue in the storage I/O, you would generally see an increase in latency. If the storage drive is taking time to respond to I/O request, then this indicates there is a bottleneck in the storage layer. A busy storage device can also be the reason why the response time is higher.


1 Answers

All persistent types (both HDD and SSD) of disk storage on GCE are network-based, where data is replicated to remote storage for higher availability. This aspect is also the reason behind the performance considerations as available network bandwidth has to be shared fairly amongst multiple tenants on the same physical machine.

GCE limits disk performance for both IOPS and bandwidth - you will be limited by whatever you hit first. The reason for this is that lots of small operations are more costly than a few large ops.

Both IOPS and bandwidth are limited on 3 aspects:

  • Type (HDD vs. SSD)
  • Size of the disk (larger disks enjoy higher limits)
  • Core count (larger instances enjoy higher limits as they occupy a larger fraction of a machine)

Additionally, PD traffic is factored into the per-core network egress cap.

The documentation has an in-depth article going through all these aspects. In summary, once you max out on disk size, type and core count, there is no way to increase performance further.

Creating RAID arrays of multiple persistent disks will not lead to increased performance as you will still hit the per-instance limit and network egress cap.

like image 145
mensi Avatar answered Oct 10 '22 07:10

mensi