Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Reading in parallel from multiple hard drives

I am writing an application that deals with lots of data (gigabytes). I am considering splitting the data onto multiple hard drives and reading it in parallel. I am wondering what kind of limitations I will run into--for example, is it possible to read from 4 or 8 hard drives in parallel, and will I get approximately 4 or 8 times the performance if disk I/O is the limiting factor? What should I look out for? Pointers to relevant docs are also appreciated--Google didn't turn up much.

EDIT: I should point out that I've looked at RAID, but the performance wasn't as good as I was hoping for. I am planning on writing this myself in C/C++.

like image 550
Sam Lee Avatar asked Mar 09 '26 02:03

Sam Lee


2 Answers

Well splitting data and reading from 4 to 8 drives in parallel would not jump the throughput by 4 to 8 times. There are other factors which you need to consider.

  1. If you reading data in the application, then threads might be required to read data from different harddisks.
  2. Windows provide overlapped and non-overlapped method of reading and writing data to hdd. See if using that increases the throughput. Same way *nux would also have read/write methods.
  3. On a single core/processor threads appear to run in parallel but its sequentially underlying. With multicore multiple threads can be read in parallel but generally OS decides what to run and when to run. So having so many threads to read might decrease performance than increase.
  4. If you check specs of any harddisk, you would see it gives random access time and sequential access time. So based on you data you may want to check these parameters.
  5. When you spliting data into different drives you need to keep in mind that your application would require syncronization of how to populate data into meaningful information. If you using threads, additionally threads should be in sync.
  6. You may get state of the art harddisk with high data read/write speeds but you other hardware may be the weak link. So you may be using a low-end motherboard or RAM which may not let you get the best of the speeds.
like image 158
K Singh Avatar answered Mar 11 '26 05:03

K Singh


If you're not going to use real RAID, you better at least use multiple hard drive controllers, otherwise you won't see much performance gain at all. One controller can't do lots of concurrent IO so it will quickly become the bottleneck.

like image 32
dj_segfault Avatar answered Mar 11 '26 05:03

dj_segfault



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!