Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Does the Scanner class load the entire file into memory at once?

I often use the Scanner class to read files because it is so convenient.

      String inputFileName;
      Scanner fileScanner;

      inputFileName = "input.txt";
      fileScanner = new Scanner (new File(inputFileName));

My question is, does the above statement load the entire file into memory at once? Or do subsequent calls on the fileScanner like

      fileScanner.nextLine();

read from the file (i.e. from external storage and not from memory)? I ask because I am concerned about what might happen if the file is too huge to be read into memory all at once. Thanks.

like image 903
CodeBlue Avatar asked Apr 26 '12 15:04

CodeBlue


People also ask

How does the Scanner class work?

Class Scanner. A simple text scanner which can parse primitive types and strings using regular expressions. A Scanner breaks its input into tokens using a delimiter pattern, which by default matches whitespace. The resulting tokens may then be converted into values of different types using the various next methods.

What is the Scanner class and why is it used?

The Scanner class is used to get user input, and it is found in the java.util package.

Which method in the Scanner class reads and returns an entire line of input?

nextLine() is a method in the Java Scanner class that returns a line of text that is read from the scanner object. This method can be used to read an entire line of text or to read input until a particular character or sequence is encountered. It is typically used to read user input from the console.

What is the proper way of importing a Java Scanner?

There are two ways to implement the Java Scanner import: explicitly reference the java. util. Scanner package and class in the import, or do a wildcard import of java. util.


3 Answers

If you read the source code you can answer the question yourself.

It appear that the implementation of the Scanner constructor in question shows:

public Scanner(File source) throws FileNotFoundException {
        this((ReadableByteChannel)(new FileInputStream(source).getChannel()));
}

Latter this is wrapped into a Reader:

private static Readable makeReadable(ReadableByteChannel source, CharsetDecoder dec) {
    return Channels.newReader(source, dec, -1);
}

And it is read using a buffer size

private static final int BUFFER_SIZE = 1024; // change to 1024;

As you can see in the final constructor in the construction chain:

private Scanner(Readable source, Pattern pattern) {
        assert source != null : "source should not be null";
        assert pattern != null : "pattern should not be null";
        this.source = source;
        delimPattern = pattern;
        buf = CharBuffer.allocate(BUFFER_SIZE);
        buf.limit(0);
        matcher = delimPattern.matcher(buf);
        matcher.useTransparentBounds(true);
        matcher.useAnchoringBounds(false);
        useLocale(Locale.getDefault(Locale.Category.FORMAT));
    }

So, it appears scanner does not read the entire file at once.

like image 94
Edwin Dalorzo Avatar answered Nov 09 '22 03:11

Edwin Dalorzo


From reading the code, it appears to load 1 KB at a time by default. The size of the buffer can increase for long lines of text. (To the size of the longest line of text you have)

like image 37
Peter Lawrey Avatar answered Nov 09 '22 01:11

Peter Lawrey


In ACM Contest the fast read is very important. In Java we found found that use something like that is very faster...

    FileInputStream inputStream = new FileInputStream("input.txt");
    InputStreamReader streamReader = new InputStreamReader(inputStream, "UTF-8");
    BufferedReader in = new BufferedReader(streamReader);
    Map<String, Integer> map = new HashMap<String, Integer>();
    int trees = 0;
    for (String s; (s = in.readLine()) != null; trees++) {
        Integer n = map.get(s);
        if (n != null) {
            map.put(s, n + 1);
        } else {
            map.put(s, 1);
        }
    }

The file contains, in that case, tree names...

Red Alder
Ash
Aspen
Basswood
Ash
Beech
Yellow Birch
Ash
Cherry
Cottonwood

You can use the StringTokenizer for catch any part of line that your want.

We have some errors if we use Scanner for large files. Read 100 lines from a file with 10000 lines!

A scanner can read text from any object which implements the Readable interface. If an invocation of the underlying readable's Readable.read(java.nio.CharBuffer) method throws an IOException then the scanner assumes that the end of the input has been reached. The most recent IOException thrown by the underlying readable can be retrieved via the ioException() method.

tells in the API

Good luck!

like image 39
Paul Vargas Avatar answered Nov 09 '22 02:11

Paul Vargas