I want to read file line by line. BufferedReader is much faster than RandomAccessFile or BufferedInputStream. But the problem is that I don't know how many bytes I read. How to know bytes read(offset)? I tried.
String buffer;
int offset = 0;
while ((buffer = br.readLine()) != null)
offset += buffer.getBytes().length + 1; // 1 is for line separator
I works if file is small. But, when the file becomes large, offset becomes smaller than actual value. How can I get offset?
There is no simple way to do this with BufferedReader
because of two effects: Character endcoding and line endings. On Windows, the line ending is \r\n
which is two bytes. On Unix, the line separator is a single byte. BufferedReader
will handle both cases without you noticing, so after readLine()
, you won't know how many bytes were skipped.
Also buffer.getBytes()
only returns the correct result when your default encoding and the encoding of the data in the file accidentally happens to be the same. When using byte[]
<-> String
conversion of any kind, you should always specify exactly which encoding should be used.
You also can't use a counting InputStream
because the buffered readers read data in large chunks. So after reading the first line with, say, 5 bytes, the counter in the inner InputStream
would return 4096 because the reader always reads that many bytes into its internal buffer.
You can have a look at NIO for this. You can use a low level ByteBuffer
to keep track of the offset and wrap that in a CharBuffer
to convert the input into lines.
Here's something that should work. It assumes UTF-8, but you can easily change that.
import java.io.*;
class main {
public static void main(final String[] args) throws Exception {
ByteCountingLineReader r = new ByteCountingLineReader(new ByteArrayInputStream(toUtf8("Hello\r\nWorld\n")));
String line = null;
do {
long count = r.byteCount();
line = r.readLine();
System.out.println("Line at byte " + count + ": " + line);
} while (line != null);
r.close();
}
static class ByteCountingLineReader implements Closeable {
InputStream in;
long _byteCount;
int bufferedByte = -1;
boolean ended;
// in should be a buffered stream!
ByteCountingLineReader(InputStream in) {
this.in = in;
}
ByteCountingLineReader(File f) throws IOException {
in = new BufferedInputStream(new FileInputStream(f), 65536);
}
String readLine() throws IOException {
ByteArrayOutputStream baos = new ByteArrayOutputStream();
if (ended) return null;
while (true) {
int c = read();
if (ended && baos.size() == 0) return null;
if (ended || c == '\n') break;
if (c == '\r') {
c = read();
if (c != '\n' && !ended)
bufferedByte = c;
break;
}
baos.write(c);
}
return fromUtf8(baos.toByteArray());
}
int read() throws IOException {
if (bufferedByte >= 0) {
int b = bufferedByte;
bufferedByte = -1;
return b;
}
int c = in.read();
if (c < 0) ended = true; else ++_byteCount;
return c;
}
long byteCount() {
return bufferedByte >= 0 ? _byteCount - 1 : _byteCount;
}
public void close() throws IOException {
if (in != null) try {
in.close();
} finally {
in = null;
}
}
boolean ended() {
return ended;
}
}
static byte[] toUtf8(String s) {
try {
return s.getBytes("UTF-8");
} catch (Exception __e) {
throw rethrow(__e);
}
}
static String fromUtf8(byte[] bytes) {
try {
return new String(bytes, "UTF-8");
} catch (Exception __e) {
throw rethrow(__e);
}
}
static RuntimeException rethrow(Throwable t) {
throw t instanceof RuntimeException ? (RuntimeException) t : new RuntimeException(t);
}
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With