Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Java Read Large Text File With 70million line of text

Tags:

java

io

I have a big test file with 70 million lines of text. I have to read the file line by line.

I used two different approaches:

InputStreamReader isr = new InputStreamReader(new FileInputStream(FilePath),"unicode"); BufferedReader br = new BufferedReader(isr); while((cur=br.readLine()) != null); 

and

LineIterator it = FileUtils.lineIterator(new File(FilePath), "unicode"); while(it.hasNext()) cur=it.nextLine(); 

Is there another approach that can make this task faster?

like image 491
zwang Avatar asked Dec 26 '12 07:12

zwang


People also ask

What is the easiest way to read text Files line by line in Java 8?

Java 8 has added a new method called lines() in the Files class which can be used to read a file line by line in Java. The beauty of this method is that it reads all lines from a file as Stream of String, which is populated lazily as the stream is consumed.

How do you read a line from a text file in Java?

We can use java.io.BufferedReader readLine() method to read file line by line to String. This method returns null when end of file is reached.

How do I read the contents of a file line by line?

The line must be terminated by any one of a line feed ("\n") or carriage return ("\r"). In the following example, Demo. txt is read by FileReader class. The readLine() method of BufferedReader class reads file line by line, and each line appended to StringBuffer, followed by a linefeed.


1 Answers

1) I am sure there is no difference speedwise, both use FileInputStream internally and buffering

2) You can take measurements and see for yourself

3) Though there's no performance benefits I like the 1.7 approach

try (BufferedReader br = Files.newBufferedReader(Paths.get("test.txt"), StandardCharsets.UTF_8)) {     for (String line = null; (line = br.readLine()) != null;) {         //     } } 

4) Scanner based version

    try (Scanner sc = new Scanner(new File("test.txt"), "UTF-8")) {         while (sc.hasNextLine()) {             String line = sc.nextLine();         }         // note that Scanner suppresses exceptions         if (sc.ioException() != null) {             throw sc.ioException();         }     } 

5) This may be faster than the rest

try (SeekableByteChannel ch = Files.newByteChannel(Paths.get("test.txt"))) {     ByteBuffer bb = ByteBuffer.allocateDirect(1000);     for(;;) {         StringBuilder line = new StringBuilder();         int n = ch.read(bb);         // add chars to line         // ...     } } 

it requires a bit of coding but it can be really faster because of ByteBuffer.allocateDirect. It allows OS to read bytes from file to ByteBuffer directly, without copying

6) Parallel processing would definitely increase speed. Make a big byte buffer, run several tasks that read bytes from file into that buffer in parallel, when ready find first end of line, make a String, find next...

like image 63
Evgeniy Dorofeev Avatar answered Oct 05 '22 14:10

Evgeniy Dorofeev