Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Fastest way to read and process string from large file in java?

Tags:

java

I have a large string in a file (its encoded data, my my custom encoding) and I want to read it and process it into my special format (decode). I want to know whats the fastest way I can do it to get the final format. I thought of some ways but not sure which would be best.

1) read entire string in 1 line and then process that string.

2) read character by character from the file and process while I am reading.

Can anyone help? Thanks

like image 414
omega Avatar asked Mar 16 '23 01:03

omega


2 Answers

Chances are the process will be IO bound not CPU bound so it probably wont matter much and if it does it will be because of the decode function, which isn't given in the question.

In theory you have two trade situations, which will determine if (1) or (2) is faster.

The assumption is that the decode is fast and so your process will be IO bound.

If by reading the whole file into memory at once you are doing less context switching then you will wasting less CPU cycles on those context switches so then reading the whole file is faster.

If by reading in the file char by char you don't prematurely yield your time to a CPU then in theory you could use the IO waiting CPU cycles to run the decode so then ready char by char will be faster.

Here are some timelines

read char by char good case

TIME    -------------------------------------------->
IO:     READ CHAR --> wait -->   READ CHAR --> wait 
DECODE: wait ------> DECODE --> wait --->  DECODE ...

read char by char bad case

TIME    -------------------------------------------->
IO:     READ CHAR --> YIELD          -->  READ CHAR --> wait 
DECODE: wait ------>  YIELD          --> DECODE --->  wait DECODE ---> ...

read whole file

TIME    -------------------------------------------->
IO:     READ CHAR .....  READ CHAR --> FINISH
DECODE: -----------------------------> DECODE --->

If your decode was really slow then a producer consumer model would probably be faster. Your best bet is to use a BufferedReader will do as much IO as it can while waisting/yielding the least amount of CPU cycles.

like image 134
Victory Avatar answered Apr 27 '23 01:04

Victory


It's fine to use a BufferedReader or BufferedInputStream and then process character by character; the buffer will read in multiple characters at a time transparently. This should give good enough performance for typical requirements.

Reading whole string is called "slurping" and given memory overhead is generally considered to be a last resort for file processing. If you are processing the in-memory string character by character anyway, it may not even have a detectable speed benefit since all you are doing is your own (very large) buffer.

With a BufferedReader or BufferedInputStream you can adjust the buffer size so it can be large if really necessary.

Given your file size (20-30mb), depending upon encoding of that file note also that Java char is 16-bit so for an ASCII text file, or a UTF-8 file with few extended characters, you must allow for double your memory usage for typical JVM implementations.

like image 39
drrob Avatar answered Apr 27 '23 00:04

drrob