I want to read a big text file, what i decided to create four threads and read 25% of file by each one. and then join them.
but its not more impressive. can any one tell me can i use concurrent programming for the same. as my file structure have some data as name contact compnay policyname policynumber uniqueno
and I want to put all data in hashmap at last.
thanks
Reading a large file is typically limited by I/O performance, not by CPU time. You can't speed up the reading by dividing into multiple threads (it will rather decrease performance, since it's still the same file, on the same drive). You can use concurrent programming to process the data, but that can only improve performance after reading the file.
You may, however, have some luck by dedicating one single thread to reading the file, and delegate the actual processing from this thread to worker threads, whenever a data unit has been read.
If it is a big file chances are that it is written to disk as a contiguous part and "streaming" the data would be faster than parallel reads as this would start moving the heads back and forth. To know what is fastest you need intimate knowledge of your target production environment, because on high end storage the data will likely be distributed over multiple disks and parallel reads might be faster.
Best approach is i think is to read it with large chunks into memory. Making it available as a ByteArrayInputStream to do the parsing.
Quite likely you will peg the CPU during parsing and handling of the data. Maybe parallel map-reduce could help here spread the load over all cores.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With