Fastest way to read many 300 bytes chunks randomly by file offset from a 2TB file?

Question

I have some 2TB read only (no writing once created) files on a RAID 5 (4 x 7.2k @ 3TB) system.

Now I have some threads that wants to read portions of that file. Every thread has an array of chunks it needs. Every chunk is addressed by file offset (position) and size (mostly about 300 bytes) to read from.

What is the fastest way to read this data. I don't care about CPU cycles, (disk) latency is what counts. So if possible I want take advantage of NCQ of the hard disks.

As the files are highly compressed and will accessed randomly and I know exactly the position, I have no other way to optimize it.

Should I pool the file reading to one thread?
Should I keep the file open?
Should every thread (maybe about 30) keep every file open simultaneously, what is with new threads that are coming (from web server)?
Will it help if I wait 100ms and sort my readings by file offsets (lowest first)?

What is the best way to read the data? Do you have experiences, tips, hints?

Eugen Rieck · Accepted Answer

The optimum number of parallel requests depends highly on factors outside your app (e.g. Disk count=4, NCQ depth=?, driver queue depth=? ...), so you might want to use a system, that can adapt or be adapted. My recommendation is:

Write all your read requests into a queue together with some metadata that allows to notify the requesting thread
have N threads dequeue from that queue, synchronously read the chunk, notify the requesting thread
Make N runtime-changeable
Since CPU is not your concern, your worker threads can calculate a floating latency average (and/or maximum, depending on your needs)
Slide N up and down, until you hit the sweet point

Why sync reads? They have lower latency than ascync reads. Why waste latency on a queue? A good lockless queue implementation starts at less than 10ns latency, much less than two thread switches

Update: Some Q/A

Should the read threads keep the files open? Yes, definitly so.

Would you use a FileStream with FileOptions.RandomAccess? Yes

You write "synchronously read the chunk". Does this mean every single read thread should start reading a chunk from disk as soon as it dequeues an order to read a chunk? Yes, that's what I meant. The queue depth of read requests is managed by the thread count.

Fastest way to read many 300 bytes chunks randomly by file offset from a 2TB file?

Tags:

c#

.net

file-io

binary-data

Chris

1 Answers

Eugen Rieck

Recent Activity

Donate For Us

Fastest way to read many 300 bytes chunks randomly by file offset from a 2TB file?

Tags:

c#

.net

file-io

binary-data

Chris

1 Answers

Eugen Rieck

Related questions

Recent Activity

Donate For Us