Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

BufferedReader in a multi-core environment

I have 8 files. Each one of them is about 1.7 GB. I'm reading those files into a byte array and that operation is fast enough.

Each file is then read as follow:

BufferedReader br=new BufferedReader(new InputStreamReader(new ByteArrayInputStream(data))); 

When processed using a single core in a sequential sense it takes abour 60 seconds to complete. However, when distributing the computation over 8 separate cores it takes far longer than 60 seconds per file.

Since the data are all in memory and no IO operations is performed, I would have presumed that it should take no longer than 60 seconds to process a single file per core. So, the total 8 files should complete in just over 60 seconds but this is not the case.

Am I missing something about BufferedReader behaviour? or any of the readers used in the above code.

It might worth mentioning that I'm using this code to upload files first:

byte[] content=org.apache.commons.io.FileUtils.readFileToByteArray(new File(filePath));

The code over all looks like this:

For each file
 read the file into a byte[]
 add the byte[] to a list
end For
For each item in the list
 create a thread and pass a byte[] to it
end For
like image 203
DotNet Avatar asked Feb 27 '13 13:02

DotNet


2 Answers

How are you actually "distributing the computation"? Is there synchronization involved? Are you simply creating 8 threads to read the 8 files?

What platform are you running on (linux, windows, etc.)? I have seen seemingly strange behavior from the windows scheduler before where it moves a single process from core to core to try and balance the load among the cores. This ended up causing slower performance than just allowing a single core to be utilized more than the rest.

like image 171
Brett Okken Avatar answered Sep 28 '22 14:09

Brett Okken


How much memory is your system rocking?

8 x 1.7GB, + operating system overhead, might mean that virtual memory / paging is having to come into play. Which is obviously much slower than RAM.

I appreciate you say each file is in memory, but do you actually have 16GB of free RAM or is there more going on at an abstracted level?

If the context switch is also having to constantly switch pages too, that would explain an increased time.

like image 31
KingCronus Avatar answered Sep 28 '22 15:09

KingCronus