It seems that there are many, many ways to read text files in Java (BufferedReader
, DataInputStream
etc.) My personal favorite is Scanner
with a File
in the constructor (it's just simpler, works with mathy data processing better, and has familiar syntax).
Boris the Spider also mentioned Channel
and RandomAccessFile
.
Can someone explain the pros and cons of each of these methods? To be specific, when would I want to use each?
(edit) I think I should be specific and add that I have a strong preference for the Scanner
method. So the real question is, when wouldn't I want to use it?
The easiest way is to use the Scanner class in Java and the FileReader object. Simple example: Scanner in = new Scanner(new FileReader("filename. txt"));
You can accelerate reading the data with BufferedInputStream . It is placed around a FileInputStream and loads data from the operating system not byte by byte, but in blocks of 8 KB and stores them in memory. The bytes can then be read out again bit by bit – and from the main memory, which is much faster.
Which of these methods are used to read in from file? Explanation: Each time read() is called, it reads a single byte from the file and returns the byte as an integer value.
Lets start at the beginning. The question is what do you want to do?
It's important to understand what a file actually is. A file is a collection of bytes on a disc, these bytes are your data. There are various levels of abstraction above that that Java provides:
File(Input|Output)Stream
- read these bytes as a stream of byte
.File(Reader|Writer)
- read from a stream of bytes as a stream of char
.Scanner
- read from a stream of char
and tokenise it.RandomAccessFile
- read these bytes as a searchable byte[]
.FileChannel
- read these bytes in a safe multithreaded way.On top of each of those there are the Decorators, for example you can add buffering with BufferedXXX
. You could add linebreak awareness to a FileWriter
with PrintWriter
. You could turn an InputStream
into a Reader
with an InputStreamReader
(currently the only way to specify character encoding for a Reader
).
So - when wouldn't I want to use it [a Scanner
]?.
You would not use a Scanner
if you wanted to, (these are some examples):
byte
sbyte
s from one file to another, maybe with some filtering.It is also worth nothing that the Scanner(File file)
constructor takes the File
and opens a FileInputStream
with the platform default encoding - this is almost always a bad idea. It is generally recognised that you should specify the encoding explicitly to avoid nasty encoding based bugs. Further the stream isn't buffered.
So you may be better off with
try (final Scanner scanner = new Scanner(new BufferedInputStream(new FileInputStream())), "UTF-8") {
//do stuff
}
Ugly, I know.
It's worth noting that Java 7 Provides a further layer of abstraction to remove the need to loop over files - these are in the Files class:
byte[] Files.readAllBytes(Path path)
List<String> Files.readAllLines(Path path, Charset cs)
Both these methods read the entire file into memory, which might not be appropriate. In Java 8 this is further improved by adding support for the new Stream
API:
Stream<String> Files.lines(Path path, Charset cs)
Stream<Path> Files.list(Path dir)
For example to get a Stream of words from a Path
you can do:
final Stream<String> words = Files.lines(Paths.get("myFile.txt")).
flatMap((in) -> Arrays.stream(in.split("\\b")));
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With