Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to quickly search a large file for a String in Java?

I am trying to search a large text file (400MB) for a particular String using the following:

File file = new File("fileName.txt");
try {
    int count = 0;
    Scanner scanner = new Scanner(file);
    while(scanner.hasNextLine()) {
        if(scanner.nextLine().contains("particularString")) {
            count++;
            System.out.println("Number of instances of String: " + count);
        }
    }
} catch (FileNotFoundException e){
    System.out.println(e);
}

This works fine for small files however for this particular file and other large ones it takes far too long (>10mins).

What would be the quickest, most efficient way of doing this?

I have now changed to the following and it completes within seconds -

try {
        int count = 0;
        FileReader fileIn = new FileReader(file);
        BufferedReader reader = new BufferedReader(fileIn);
        String line;
        while((line = reader.readLine()) != null) {
            if((line.contains("particularString"))) {
                count++;
                System.out.println("Number of instances of String " + count);
            }
        }
    }catch (IOException e){
        System.out.println(e);
    }
like image 813
Chief DMG Avatar asked Apr 28 '16 14:04

Chief DMG


1 Answers

Scanner is simply not useful in this case. Under the hood, it does all kinds of input parsing, checking, caching and whatnot. If your case is simply "iterate over all lines of a file", use something that is based on a simple BufferedReader.

In your particular case, I recommend using Files.lines.

Example:

  long count = Files.lines(Paths.get("testfile.txt"))
     .filter(s -> s.contains("particularString"))
     .count();
  System.out.println(count);

(Note that this particular case of the streaming api probably does not cover what you are actually trying to achieve - unfortunately your question does not indicate what the result of the method should be.)

On my system, I get about 15% of Scanner runtime with Files.lines() or a buffered reader.

like image 160
mtj Avatar answered Oct 24 '22 08:10

mtj