Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Java Fastest way to read through text file with 2 million lines

Tags:

java

file

Currently I am using scanner/filereader and using while hasnextline. I think this method is not highly efficient. Is there any other method to read file with the similar functionality of this?

public void Read(String file) {         Scanner sc = null;           try {             sc = new Scanner(new FileReader(file));              while (sc.hasNextLine()) {                 String text = sc.nextLine();                 String[] file_Array = text.split(" ", 3);                  if (file_Array[0].equalsIgnoreCase("case")) {                     //do something                 } else if (file_Array[0].equalsIgnoreCase("object")) {                     //do something                 } else if (file_Array[0].equalsIgnoreCase("classes")) {                     //do something                 } else if (file_Array[0].equalsIgnoreCase("function")) {                     //do something                 }                  else if (file_Array[0].equalsIgnoreCase("ignore")) {                     //do something                 }                 else if (file_Array[0].equalsIgnoreCase("display")) {                     //do something                 }             }          } catch (FileNotFoundException e) {             System.out.println("Input file " + file + " not found");             System.exit(1);         } finally {             sc.close();         }     } 
like image 915
BeyondProgrammer Avatar asked Oct 21 '13 04:10

BeyondProgrammer


People also ask

How do you read an efficient file in Java?

Reading Text Files in Java with BufferedReader If you want to read a file line by line, using BufferedReader is a good choice. BufferedReader is efficient in reading large files. The readline() method returns null when the end of the file is reached. Note: Don't forget to close the file when reading is finished.

What is the easiest way to read text files line by line in Java 8?

Java 8 has added a new method called lines() in the Files class which can be used to read a file line by line in Java. The beauty of this method is that it reads all lines from a file as Stream of String, which is populated lazily as the stream is consumed.

Is BufferedReader faster?

BufferedReader is a bit faster as compared to scanner because the scanner does the parsing of input data and BufferedReader simply reads a sequence of characters.


2 Answers

You will find that BufferedReader.readLine() is as fast as you need: you can read millions of lines a second with it. It is more probable that your string splitting and handling is causing whatever performance problems you are encountering.

like image 174
user207421 Avatar answered Sep 22 '22 15:09

user207421


I made a gist comparing different methods:

import java.io.*; import java.nio.file.Files; import java.nio.file.Paths; import java.util.ArrayList; import java.util.LinkedList; import java.util.List; import java.util.Scanner; import java.util.function.Function;  public class Main {      public static void main(String[] args) {          String path = "resources/testfile.txt";         measureTime("BufferedReader.readLine() into LinkedList", Main::bufferReaderToLinkedList, path);         measureTime("BufferedReader.readLine() into ArrayList", Main::bufferReaderToArrayList, path);         measureTime("Files.readAllLines()", Main::readAllLines, path);         measureTime("Scanner.nextLine() into ArrayList", Main::scannerArrayList, path);         measureTime("Scanner.nextLine() into LinkedList", Main::scannerLinkedList, path);         measureTime("RandomAccessFile.readLine() into ArrayList", Main::randomAccessFileArrayList, path);         measureTime("RandomAccessFile.readLine() into LinkedList", Main::randomAccessFileLinkedList, path);         System.out.println("-----------------------------------------------------------");     }      private static void measureTime(String name, Function<String, List<String>> fn, String path) {         System.out.println("-----------------------------------------------------------");         System.out.println("run: " + name);         long startTime = System.nanoTime();         List<String> l = fn.apply(path);         long estimatedTime = System.nanoTime() - startTime;         System.out.println("lines: " + l.size());         System.out.println("estimatedTime: " + estimatedTime / 1_000_000_000.);     }      private static List<String> bufferReaderToLinkedList(String path) {         return bufferReaderToList(path, new LinkedList<>());     }      private static List<String> bufferReaderToArrayList(String path) {         return bufferReaderToList(path, new ArrayList<>());     }      private static List<String> bufferReaderToList(String path, List<String> list) {         try {             final BufferedReader in = new BufferedReader(                 new InputStreamReader(new FileInputStream(path), StandardCharsets.UTF_8));             String line;             while ((line = in.readLine()) != null) {                 list.add(line);             }             in.close();         } catch (final IOException e) {             e.printStackTrace();         }         return list;     }      private static List<String> readAllLines(String path) {         try {             return Files.readAllLines(Paths.get(path));         } catch (IOException e) {             e.printStackTrace();         }         return null;     }      private static List<String> randomAccessFileLinkedList(String path) {         return randomAccessFile(path, new LinkedList<>());     }      private static List<String> randomAccessFileArrayList(String path) {         return randomAccessFile(path, new ArrayList<>());     }      private static List<String> randomAccessFile(String path, List<String> list) {         try {             RandomAccessFile file = new RandomAccessFile(path, "r");             String str;             while ((str = file.readLine()) != null) {                 list.add(str);             }             file.close();         } catch (IOException e) {             e.printStackTrace();         }         return list;     }      private static List<String> scannerLinkedList(String path) {         return scanner(path, new LinkedList<>());     }      private static List<String> scannerArrayList(String path) {         return scanner(path, new ArrayList<>());     }      private static List<String> scanner(String path, List<String> list) {         try {             Scanner scanner = new Scanner(new File(path));             while (scanner.hasNextLine()) {                 list.add(scanner.nextLine());             }             scanner.close();         } catch (FileNotFoundException e) {             e.printStackTrace();         }         return list;     }   } 

run: BufferedReader.readLine() into LinkedList, lines: 1000000, estimatedTime: 0.105118655

run: BufferedReader.readLine() into ArrayList, lines: 1000000, estimatedTime: 0.072696934

run: Files.readAllLines(), lines: 1000000, estimatedTime: 0.087753316

run: Scanner.nextLine() into ArrayList, lines: 1000000, estimatedTime: 0.743121734

run: Scanner.nextLine() into LinkedList, lines: 1000000, estimatedTime: 0.867049885

run: RandomAccessFile.readLine() into ArrayList, lines: 1000000, estimatedTime: 11.413323046

run: RandomAccessFile.readLine() into LinkedList, lines: 1000000, estimatedTime: 11.423862897

BufferedReader is the fastest, Files.readAllLines() is also acceptable, Scanner is slow due to regex, RandomAccessFile is inacceptable

like image 27
YAMM Avatar answered Sep 26 '22 15:09

YAMM