Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Reading and processing big text file of 25GB

I have to read a big text file of, say, 25 GB and need to process this file within 15-20 minutes. This file will have multiple header and footer section.

I tried CSplit to split this file based on header, but it is taking around 24 to 25 min to split it to a number of files based on header, which is not acceptable at all.

I tried sequential reading and writing by using BufferReader and BufferWiter along with FileReader and FileWriter. It is taking more than 27 min. Again, it is not acceptable.

I tried another approach like get the start index of each header and then run multiple threads to read file from specific location by using RandomAccessFile. But no luck on this.

How can I achieve my requirement?

Possible duplicate of:

Read large files in Java

like image 355
user1142292 Avatar asked Jan 11 '12 04:01

user1142292


People also ask

How do I open a 20gb text file?

To be able to open such large CSV files, you need to download and use a third-party application. If all you want is to view such files, then Large Text File Viewer is the best choice for you. For actually editing them, you can try a feature-rich text editor like Emacs, or go for a premium tool like CSV Explorer.

How do I read a 100 GB file in Python?

Reading Large Text Files in Python We can use the file object as an iterator. The iterator will return each line one by one, which can be processed. This will not read the whole file into memory and it's suitable to read large files in Python.


1 Answers

Try using a large buffer read size (for example, 20MB instead of 2MB) to process your data quicker. Also don't use a BufferedReader because of slow speeds and character conversions.

This question has been asked before: Read large files in Java

like image 176
xikkub Avatar answered Oct 17 '22 20:10

xikkub