Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Algorithms for Optimization with Fast Disk Storage (SSDs)?

Given that Solid State Disks (SSDs) are decreasing in price and soon will become more prevalent as system drives, and given that their access rates are significantly higher than rotating magnetic media, what standard algorithms will gain in performance from the use of SSDs for local storage? For example, the high random read speed of SSDs makes something like a disk-based hashtable a viability for large hashstables; 4GB of disk space is readily available, which makes hashing to the entire range of a 32-bit integer viable (more for lookup than population, though, which would still take a long time); while this size of a hashtable would be prohibitive to work with with rotating media due to the access speed, it shouldn't be as much of an issue with SSDs.

Are there any other areas where the impending transition to SSDs will provide potential gains in algorithmic performance? I'd rather see reasoning as to how one thing will work rather than opinion; I don't want this to turn contentious.

like image 529
Paul Sonier Avatar asked Jun 16 '09 21:06

Paul Sonier


People also ask

Which type of disk drive can provide a very fast boot?

Instead, they use integrated circuits (ICs) just like third-generation computers. This makes them more durable, faster, and less prone to damage and corruption. SSD hard drives have a notable advantage of transferring data at speed of 550 MB/S and allow faster boot times than the types of hard drives before them.

What is SSD in computer?

Solid-state drives (SSDs) are the most common storage drives today. SSDs are smaller and faster than hard disk drives (HDDs). SSDs are noiseless and allow PCs to be thinner and more lightweight. Hard disk drives (HDDs) are more common in older devices.

How does an SSD work?

SSDs store data permanently inside an integrated circuit, typically using flash memory. The flash memory inside an SSD means data is written, transferred, and erased electronically and silently — SSDs don't have the moving parts found inside mechanical hard-disk drives (HDDs).

What is the use of SSD in laptop?

SSDs offers shorter boot times for your computer, more immediate data transfer, and higher bandwidth. Faster speeds mean SSDs can handle data at the ultra-speeds necessary in today's business world especially when running programs that access large amounts of data such as an operating system.


2 Answers

Your example of hashtables is indeed the key database structure that will benefit. Instead of having to load a whole 4GB or more file into memory to probe for values, the SSD can be probed directly. The SSD is still slower than RAM, by orders of magnitude, but it's quite reasonable to have a 50GB hash table on disk, but not in RAM unless you pay big money for big iron.

An example is chess position databases. I have over 50GB of hashed positions. There is complex code to try to group related positions near each other in the hash, so I can page in 10MB of the table at a time and hope to reuse some of it for multiple similar position queries. There's a ton of code and complexity to make this efficient.

Replaced with an SSD, I was able to drop all the complexity of the clustering and just use really dumb randomized hashes. I also got an increase in performance since I only fetch the data I need from the disk, not big 10MB chunks. The latency is indeed larger, but the net speedup is significant.. and the super-clean code (20 lines, not 800+), is perhaps even nicer.

like image 119
SPWorley Avatar answered Sep 26 '22 07:09

SPWorley


SSDs are only significantly faster for random access. Sequential access to disk they are only twice as performant as mainstream rotational drives. Many SSDs have poorer performance in many scenarios causing them to perform worse, as described here.

While SSDs do move the needle considerably, they are still much slower than CPU operations and physical memory. For your 4GB hash table example, you may be able to sustain 250+ MB/s off of an SSD for accessing random hash table buckets. For a rotational drive, you'd be lucky to break the single digit MB/s. If you can keep this 4 GB hash table in memory, you could access it on the order of gigabytes a second - much faster than even a very swift SSD.

The referenced article lists several changes MS made for Windows 7 when running on SSD's, which can give you an idea of the sort of changes you could consider making. First, SuperFetch for prefetching data off of disk is disabled - it's designed to get around slow random access times for disk which are alleviated by SSDs. Defrag is disabled, because having files scattered across the disk aren't a performance hit for SSDs.

like image 26
Michael Avatar answered Sep 23 '22 07:09

Michael