Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

I want to read a big text file

Tags:

java

I want to read a big text file, what i decided to create four threads and read 25% of file by each one. and then join them.

but its not more impressive. can any one tell me can i use concurrent programming for the same. as my file structure have some data as name contact compnay policyname policynumber uniqueno

and I want to put all data in hashmap at last.

thanks

like image 610
Pedantic Avatar asked Dec 22 '22 02:12

Pedantic


2 Answers

Reading a large file is typically limited by I/O performance, not by CPU time. You can't speed up the reading by dividing into multiple threads (it will rather decrease performance, since it's still the same file, on the same drive). You can use concurrent programming to process the data, but that can only improve performance after reading the file.

You may, however, have some luck by dedicating one single thread to reading the file, and delegate the actual processing from this thread to worker threads, whenever a data unit has been read.

like image 93
OregonGhost Avatar answered Dec 28 '22 10:12

OregonGhost


If it is a big file chances are that it is written to disk as a contiguous part and "streaming" the data would be faster than parallel reads as this would start moving the heads back and forth. To know what is fastest you need intimate knowledge of your target production environment, because on high end storage the data will likely be distributed over multiple disks and parallel reads might be faster.

Best approach is i think is to read it with large chunks into memory. Making it available as a ByteArrayInputStream to do the parsing.

Quite likely you will peg the CPU during parsing and handling of the data. Maybe parallel map-reduce could help here spread the load over all cores.

like image 40
Peter Tillemans Avatar answered Dec 28 '22 09:12

Peter Tillemans