I am having weird behavior with Scanner. It will work with a particular set of files I am using when I use the Scanner(FileInputStream)
constructor, but it won't with the Scanner(File)
constructor.
Scanner(File)
Scanner s = new Scanner(new File("file"));
while(s.hasNextLine()) {
System.out.println(s.nextLine());
}
Result: no output
Scanner(FileInputStream)
Scanner s = new Scanner(new FileInputStream(new File("file")));
while(s.hasNextLine()) {
System.out.println(s.nextLine());
}
Result: the file content outputs to the console.
The input file is a java file containing a single class.
I double checked programmatically (in Java) that:
Typically Scanner(File)
works for me in this case, I am not sure why it doesn't now.
hasNextLine() calls findWithinHorizon() which in turns calls findPatternInBuffer(), searching a match for a line terminator character pattern defined as .*(\r\n|[\n\r\u2028\u2029\u0085])|.+$
Strange thing is that with both ways to construct a Scanner (with FileInputStream or via File), findPatternInBuffer returns a positive match if the file contains (independently from file size) for instance the 0x0A line terminator; but in the case the file contains a character out of ascii (ie >= 7f), using FileInputStream returns true while using File returns false.
Very simple test case:
create a file which contains just char "a"
# hexedit file
00000000 61 0A a.
# java Test.java
using File: true
using FileInputStream: true
now edit the file with hexedit to:
# hexedit file
00000000 61 0A 80 a..
# java Test.java
using File: false
using FileInputStream: true
in the test java code there is nothing else than what already in the question:
import java.io.*;
import java.lang.*;
import java.util.*;
public class Test {
public static void main(String[] args) {
try {
File file1 = new File("file");
Scanner s1 = new Scanner(file1);
System.out.println("using File: "+s1.hasNextLine());
File file2 = new File("file");
Scanner s2 = new Scanner(new FileInputStream(file2));
System.out.println("using FileInputStream: "+s2.hasNextLine());
} catch (IOException e) {
e.printStackTrace();
}
}
}
SO, it turns out this is a charset issue. In facts, changing the test to:
Scanner s1 = new Scanner(file1, "latin1");
we get:
# java Test
using File: true
using FileInputStream: true
From looking at the Oracle/Sun JDK's 1.6.0_23 implementation of Scanner, the Scanner(File)
constructor invokes a FileInputStream
, which is meant for raw binary data.
This points to a difference in buffering and parsing technique used when invoking one constructor or another, which will directly impact your code on the call to hasNextLine()
.
Scanner(InputStream)
uses an InputStreamReader
while Scanner(File)
uses an InputStream
passed to a ByteChannel
(and probably reads the whole file in one jump, thus advancing the cursor, in your case).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With