Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

multithread read from disk?

Suppose I need to read many distinct, independent chunks of data from the same file saved on disk.

Is it possible to multi-thread this upload?

Related: Do all threads on the same processor use the same IO device to read from disk? In this case, multi-threading would not speed up the upload at all - the threads would just be waiting in line.

(I am currently multi-threading with OpenMP.)

like image 506
cmo Avatar asked Nov 16 '12 17:11

cmo


People also ask

Can multiple threads read the same file?

Multiple threads can also read data from the same FITS file simultaneously, as long as the file was opened independently by each thread. This relies on the operating system to correctly deal with reading the same file by multiple processes.

Can two threads read memory at the same time?

Unlike with isolated programs, threads share the same memory space, so two threads can read and write anything in each other's memory at the same time. Threads each have their own registers and stack areas, but the stacks are each in their own area of the same memory space.

Is SMB multithreaded?

The short version is that smbd is not multithreaded, and alternative servers that take this approach under Unix (such as Syntax, at the time of writing) suffer tremendous performance penalties and are less robust.

Is multithreading faster on a single-core?

In fact, you should expect the program to run significantly slower than the single-threaded version. Multithreading generally makes programs run slower, not faster.


2 Answers

Yes, it is possible. However:

Do all threads on the same processor use the same IO device to read from disk?

Yes. The read head on the disk. As an example, try copying two files in parallel as opposed to in series. It will take significantly longer in parallel, because the OS uses scheduling algorithms to make sure the IO rate is "fair," or equal between the two threads/processes. Because of this, the read head will jump back and forth between different parts of the disk, slowing the process down A LOT. The time to actually read the data is pretty small compared to the time to seek to it, and when you're reading two different parts of the disk at once, you spend most of the time seeking.

Note that all of this assumes you're using a hard disk. If you're using an SSD, it will not be slower in parallel, but it will not be faster either. Edit: according to comments parallel is actually faster for an SSD. With RAID the situation becomes more complicated, and (obviously) depends on what kind of RAID you're using.

This is what it looks like (I've unwrapped the circular disk into a rectangle because ascii circles are hard, and simplified the data layout to make it easier to read):

Assume the files are separated by some space on the platter like so:

|         |

A series read will look like (* indicates reading)

space ----->
|        *|  t
|        *|  i
|        *|  m
|        *|  e
|        *|  |
|       / |  |
|     /   |  |
|   /     |  V
|  /      |
|*        |
|*        |
|*        |
|*        |

While a parallel read will look like

|       \ |
|        *|
|       / |
|     /   |
|   /     |
|  /      |
|*        |
|  \      |
|    \    |
|     \   |
|       \ |
|        *|
|       / |
|     /   |
|   /     |
|  /      |
|*        |
|  \      |
|    \    |
|     \   |
|       \ |
|        *|

etc

like image 166
Dan Avatar answered Oct 07 '22 22:10

Dan


If you're doing this on Windows you might want to look into the ReadFileScatter function. It will let you read multiple segments from a file in a single asynchronous call. This will allow the OS to better control the file IO bottle neck and hopefully optimizes the reads.

The matching write call on Windows would be WriteFileGather.

For UNIX you're looking at readv and writev to do the same thing.

like image 42
Fox Cutter Avatar answered Oct 07 '22 23:10

Fox Cutter