Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Reading a large number of files quickly

Tags:

c#

.net-2.0

I have a large number of (>100k) relatively small files (1kb - 300kb) that I need to read in and process. I'm currently looping through all the files and using File.ReadAllText to read the content, processing it, and then reading the next file. This is quite slow and I was wondering if there is a good way to optimize it.

I have already tried using multiple threads but as this seems to be IO bound I didn't see any improvements.

like image 917
Tim Avatar asked Jul 08 '10 16:07

Tim


1 Answers

You're most likely correct - Reading that many files is probably going to limit your potential speedups since the Disk I/O will be the limiting factor.

That being said, you very likely can do a small improvement by passing the processing of the data into a separate thread.

I would recommend trying to have a single "producer" thread that reads your files. This thread will be IO limited. As it reads a file, it can push the "processing" into a ThreadPool thread (.NET 4 tasks work great for this too) in order to do the processing, which would allow it to immediately read the next file.

This will at least take the "processing time" out of the total runtime, making the total time for your job nearly as fast as the Disk IO, provided you've got an extra core or two to work with...

like image 191
Reed Copsey Avatar answered Oct 09 '22 15:10

Reed Copsey