Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Does it consume CPU when reading a large file

Suppose I want to do following opeartions on my 2-core machine:

  1. Read a very large file

  2. Compute

Does the file reading operation need to consume 1 core? Previously I just create 2 threads, one to read file and one to compute? Should I create an additional thread to do compute?

Thanks.

Edit

Thanks guys, yea, we should always consider if the file I/O blocks the computing. Now let's just consider that the file I/O will never block computing, you can think the computing doesn't depends on the file's data, we just read the file in for future processing. Now we have 2 core, we need to read in a file, and we need to do computing, is it the best solution to create 3 threads, 1 for file reading and 2 for computing, as most of you has already pointed out: file reading consumes very little CPU?

like image 222
Baiyan Huang Avatar asked Oct 18 '25 14:10

Baiyan Huang


2 Answers

It depends on how your hardware is configured. Normally, reading is not CPU-intensive, thanks to DMA. It may be very expensive though, if it initiates swap-out of other applications. But there is more to it.

Don't read a huge file at once if you can

If your file is really big, you should use mmap or sequential processing, when you don't need to read a whole file at once. Try to consume it by chunks is possible.

For example, to sum all values in a huge file, you don't need to load this file into the memory. You can process it by small chunks, accumulating the sum. Memory is an expensive resource in most situations.

Reading is sequential

Does the file reading operation need to consume 1 core?

Yes, I think most low-level read operations are implemented sequentially (consume 1 core).

You can avoid blocking on read operation if you use asynchronous I/O, but it is just a variation of the same "read by small chunks" technique. You can launch several small asynchronous read operations at once, but you have always to check if an operation has finished before you use the result.

See also this Stack Overflow answer to a related question).

Reading and computing in parallel

Previously I just create 2 threads, one to read file and one to compute? Should I create an additional thread to do compute?

It depends, if you need all data to start computations, than there is no reason to start computation in parallel. It will have to wait effectively until reading is done.

If you can start computing even with partial data, likely you don't need to read the whole file at once. And it is usually much better not to do so with huge files.

What is your bottleneck — computation or IO?

Finally, you should know if your task is computation-bound or input-output bound. If it is limited by the performance of input-output subsystem, there is little benefit in parallelizing computation. If computation is very CPU-intensive, and reading time is negligible, you can benefit from parallelizing computation. Input-output is usually a bottleneck unless you are doing some number-crunching.

like image 96
sastanin Avatar answered Oct 20 '25 23:10

sastanin


This is a good candidate for parallelization, because you have two types of operations here - disk I/O (for reading the file), and CPU load (for your computations). So the first step would be to write your application such that the file I/O wasn't blocking the computation. You could do this by reading a little bit at a time from the file and handing it off to the compute thread.

But now you're saying you have two cores that you want to utilize. Your second thought about parallelizing the CPU-intensive part is correct, because we can only parallelize compute tasks if we have more than one processor to use. But, it might be the case that the blocking part of your application is still the file I/O - that depends on a lot of factors, and the only way to tell what level of parallelization is appropriate is to benchmark.

SO required caveat: multithreading is hard and error-prone, and it's better to have correct code than fast code, if you can pick only one. But I don't advocate against threads, as you may find from others on the site.

like image 36
danben Avatar answered Oct 20 '25 22:10

danben



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!