Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Low-latency Key-Value Store for SSD

We are working on a SSD-backed key-value solution with the following properties:

  • Throughput: 10000 TPS; 50/50 puts/gets;
  • Latency: 1ms average, 99.9th percentile 10ms
  • Data volume: ~1 billion values, ~150 bytes each; 64-bit keys; random access, 20% of data fits RAM

We tried KyotoCabinet, LevelDB, and RethinkDB on commodity SSDs, with different Linux IO schedulers, ext3/xfs file systems; made a number of tests using Rebench; and found that in all cases:

  • Read-only throughput/latency are very good
  • Write/update-only throughout is moderate, but there are many high-latency outliers
  • Mixed read/write workload causes catastrophic oscillation in throughput/latency even in case of direct access to the block device (bypassing the file system)

The picture below illustrates such behavior for KyotoCabinet (horizontal axis is time, three periods are clearly visible - read-only, mixed, update only).

The question is: is it possible to achieve low latency for described SLAs using SSDs and what key-value stores are recommended?

enter image description here

like image 968
user1128016 Avatar asked May 14 '12 05:05

user1128016


2 Answers

Highly variant write latency is a common attribute of SSDs (especially consumer models). There is a pretty good explanation of why in this AnandTech review .

Summary is that the SSD write performance worsens overtime as the wear leveling overhead increases. As the number of free pages on the drive decreases the NAND controller must start defragmenting pages, which contributes to latency. The NAND also must build an LBA to block map to track the random distribution of data across various NAND blocks. As this map grows, operations on the map (inserts, deletions) will get slower.

You aren't going to be able to solve a low level HW issue with a SW approach, you are going to need to either move up to an enterprise level SSD or relax your latency requirements.

like image 162
Dan B Avatar answered Nov 12 '22 05:11

Dan B


Aerospike is a newer key/value (row) store that can run completely off of SSDs with < 1ms latency for read/write and very high TPS (reaching into millions).

SSDs have great random read access but the key to reducing variance on writes is using sequential IO (this is similar to regular hard disks). It also greatly reduces wear leveling and fade that can occur with lots of writes on SSDs.

If you're building your own key-value system, use a log-structured approach (like Aerospike) so that writes are in bulk and appended/written in large chunks. An in-memory index can maintain the correct data locations for the values while a background process cleans stale/deleted data from disk and defrags files.

like image 36
Mani Gandham Avatar answered Nov 12 '22 05:11

Mani Gandham