Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to know bytes read(offset) of BufferedReader?

Tags:

java

io

I want to read file line by line. BufferedReader is much faster than RandomAccessFile or BufferedInputStream. But the problem is that I don't know how many bytes I read. How to know bytes read(offset)? I tried.

String buffer;
int offset = 0;

while ((buffer = br.readLine()) != null)
    offset += buffer.getBytes().length + 1; // 1 is for line separator

I works if file is small. But, when the file becomes large, offset becomes smaller than actual value. How can I get offset?

like image 694
user1301568 Avatar asked Oct 05 '22 21:10

user1301568


2 Answers

There is no simple way to do this with BufferedReader because of two effects: Character endcoding and line endings. On Windows, the line ending is \r\n which is two bytes. On Unix, the line separator is a single byte. BufferedReader will handle both cases without you noticing, so after readLine(), you won't know how many bytes were skipped.

Also buffer.getBytes() only returns the correct result when your default encoding and the encoding of the data in the file accidentally happens to be the same. When using byte[] <-> String conversion of any kind, you should always specify exactly which encoding should be used.

You also can't use a counting InputStream because the buffered readers read data in large chunks. So after reading the first line with, say, 5 bytes, the counter in the inner InputStream would return 4096 because the reader always reads that many bytes into its internal buffer.

You can have a look at NIO for this. You can use a low level ByteBuffer to keep track of the offset and wrap that in a CharBuffer to convert the input into lines.

like image 110
Aaron Digulla Avatar answered Oct 10 '22 02:10

Aaron Digulla


Here's something that should work. It assumes UTF-8, but you can easily change that.

import java.io.*;

class main {
    public static void main(final String[] args) throws Exception {
        ByteCountingLineReader r = new ByteCountingLineReader(new ByteArrayInputStream(toUtf8("Hello\r\nWorld\n")));

        String line = null;
        do {
            long count = r.byteCount();
            line = r.readLine();
            System.out.println("Line at byte " + count + ": " + line);
        } while (line != null);

        r.close();
    }

    static class ByteCountingLineReader implements Closeable {
        InputStream in;
        long _byteCount;
        int bufferedByte = -1;
        boolean ended;

        // in should be a buffered stream!
        ByteCountingLineReader(InputStream in) {
            this.in = in;
        }

        ByteCountingLineReader(File f) throws IOException {
            in = new BufferedInputStream(new FileInputStream(f), 65536);
        }

        String readLine() throws IOException {
            ByteArrayOutputStream baos = new ByteArrayOutputStream();
            if (ended) return null;
            while (true) {
                int c = read();
                if (ended && baos.size() == 0) return null;
                if (ended || c == '\n') break;
                if (c == '\r') {
                    c = read();
                    if (c != '\n' && !ended)
                        bufferedByte = c;
                    break;
                }
                baos.write(c);
            }
            return fromUtf8(baos.toByteArray());
        }

        int read() throws IOException {
            if (bufferedByte >= 0) {
                int b = bufferedByte;
                bufferedByte = -1;
                return b;
            }
            int c = in.read();
            if (c < 0) ended = true; else ++_byteCount;
            return c;
        }

        long byteCount() {
            return bufferedByte >= 0 ? _byteCount - 1 : _byteCount;
        }

        public void close() throws IOException {
            if (in != null) try {
                in.close();
            } finally {
                in = null;
            }
        }

        boolean ended() {
            return ended;
        }
    }

    static byte[] toUtf8(String s) {
        try {
            return s.getBytes("UTF-8");
        } catch (Exception __e) {
            throw rethrow(__e);
        }
    }

    static String fromUtf8(byte[] bytes) {
        try {
            return new String(bytes, "UTF-8");
        } catch (Exception __e) {
            throw rethrow(__e);
        }
    }

    static RuntimeException rethrow(Throwable t) {

        throw t instanceof RuntimeException ? (RuntimeException) t : new RuntimeException(t);
    }
}
like image 40
Stefan Reich Avatar answered Oct 10 '22 01:10

Stefan Reich