I have a large string in a file (its encoded data, my my custom encoding) and I want to read it and process it into my special format (decode). I want to know whats the fastest way I can do it to get the final format. I thought of some ways but not sure which would be best.
1) read entire string in 1 line and then process that string.
2) read character by character from the file and process while I am reading.
Can anyone help? Thanks
Chances are the process will be IO bound not CPU bound so it probably wont matter much and if it does it will be because of the decode function, which isn't given in the question.
In theory you have two trade situations, which will determine if (1) or (2) is faster.
The assumption is that the decode is fast and so your process will be IO bound.
If by reading the whole file into memory at once you are doing less context switching then you will wasting less CPU cycles on those context switches so then reading the whole file is faster.
If by reading in the file char by char you don't prematurely yield your time to a CPU then in theory you could use the IO waiting CPU cycles to run the decode so then ready char by char will be faster.
TIME -------------------------------------------->
IO: READ CHAR --> wait --> READ CHAR --> wait
DECODE: wait ------> DECODE --> wait ---> DECODE ...
TIME -------------------------------------------->
IO: READ CHAR --> YIELD --> READ CHAR --> wait
DECODE: wait ------> YIELD --> DECODE ---> wait DECODE ---> ...
TIME -------------------------------------------->
IO: READ CHAR ..... READ CHAR --> FINISH
DECODE: -----------------------------> DECODE --->
If your decode was really slow then a producer consumer model would probably be faster. Your best bet is to use a BufferedReader will do as much IO as it can while waisting/yielding the least amount of CPU cycles.
It's fine to use a BufferedReader or BufferedInputStream and then process character by character; the buffer will read in multiple characters at a time transparently. This should give good enough performance for typical requirements.
Reading whole string is called "slurping" and given memory overhead is generally considered to be a last resort for file processing. If you are processing the in-memory string character by character anyway, it may not even have a detectable speed benefit since all you are doing is your own (very large) buffer.
With a BufferedReader or BufferedInputStream you can adjust the buffer size so it can be large if really necessary.
Given your file size (20-30mb), depending upon encoding of that file note also that Java char is 16-bit so for an ASCII text file, or a UTF-8 file with few extended characters, you must allow for double your memory usage for typical JVM implementations.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With