Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Key/Value store extremely slow on SSD

What I am sure of :

  • I am working with Java/Eclipse on Linux and trying to store a very large number of key/value pairs of 16/32 bytes respectively on disk. Keys are fully random, generated with SecureRandom.
  • The speed is constant at ~50000 inserts/sec until it reaches ~1 million entries.
  • Once this limit is reached, the java process oscillates every 1-2 seconds from 0% CPU to 100%, from 150MB of memory to 400MB, and from 10 inserts/sec to 100.
  • I tried with both Berkeley DB and Kyoto Cabinet and with both Btrees and Hashtables. Same results.

What might contribute :

  • It's writing on SSD.
  • For every insert there is on average 1.5 reads −alternating reads and writes constantly.

I suspect the nice 50000 rate is up until some cache/buffer limit is reached. Then the big slow down might be due to SSD not handling read/write mixed together, as suggested on this question : Low-latency Key-Value Store for SSD.

Question is :
Where might this extreme slow down be from ? It can't be all SSD's fault. Lots of people use happily SSD for high speed DB process, and I'm sure they mix read and write a lot.

Thanks.

Edit : I've made sure to remove any memory limit, and the java process has always room to allocate more memory.
Edit : Removing readings and doing inserts only does not change the problem.

Last Edit : For the record, for hash tables it seems related to the initial number buckets. On Kyoto cabinet that number cannot be changed and is defaulted to ~1 million, so better get the number right at creation time (1 to 4 times the maximum number of records to store). For BDB, it is designed to grow progressively the number of buckets, but as it is ressource consuming, better predefine the number in advance.

like image 676
Kai Elvin Avatar asked Oct 23 '12 03:10

Kai Elvin


1 Answers

Your problem might be related to the strong durability guarantees of the databases you are using.

Basically, for any database that is ACID-compliant, at least one fsync() call per database commit will be necessary. This has to happen in order to guarantee durability (otherwise, updates could be lost in case of a system failure), but also to guarantee internal consistency of the database on disk. The database API will not return from the insert operation before the completion of the fsync() call.

fsync() can be a very heavy-weight operation on many operating systems and disk hardware, even on SSDs. (An exception to that would be battery- or capacitor-backed enterprise SSDs - they can treat a cache flush operation basically as a no-op to avoid exactly the delay you are probably experiencing.)

A solution would be to do all your stores inside of one big transaction. I don't know about Berkeley DB, but for sqlite, performance can be greatly improved that way.

To figure out if that is your problem at all, you could try to watch your database writing process with strace and look for frequent fsync() calls (more than a few every second would be a pretty strong hint).

Update: If you are absolutely sure that you don't require durability, you can try the answer from Optimizing Put Performance in Berkeley DB; if you do, you should look into the TDS (transactional data storage) feature of Berkeley DB.

like image 142
lxgr Avatar answered Oct 31 '22 13:10

lxgr