I have to read a big text file of, say, 25 GB and need to process this file within 15-20 minutes. This file will have multiple header and footer section.
I tried CSplit to split this file based on header, but it is taking around 24 to 25 min to split it to a number of files based on header, which is not acceptable at all.
I tried sequential reading and writing by using BufferReader
and BufferWiter
along with FileReader
and FileWriter
. It is taking more than 27 min. Again, it is not acceptable.
I tried another approach like get the start index of each header and then run multiple threads to read file from specific location by using RandomAccessFile
. But no luck on this.
How can I achieve my requirement?
Possible duplicate of:
Read large files in Java
To be able to open such large CSV files, you need to download and use a third-party application. If all you want is to view such files, then Large Text File Viewer is the best choice for you. For actually editing them, you can try a feature-rich text editor like Emacs, or go for a premium tool like CSV Explorer.
Reading Large Text Files in Python We can use the file object as an iterator. The iterator will return each line one by one, which can be processed. This will not read the whole file into memory and it's suitable to read large files in Python.
Try using a large buffer read size (for example, 20MB instead of 2MB) to process your data quicker. Also don't use a BufferedReader because of slow speeds and character conversions.
This question has been asked before: Read large files in Java
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With