Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to use AsynchronousFileChannel to read to a StringBuffer efficiently

Tags:

java

utf

nio

So you know you can use AsynchronousFileChannel to read an entire file to a String:

 AsynchronousFileChannel fileChannel = AsynchronousFileChannel.open(filePath, StandardOpenOption.READ);
            long len = fileChannel.size();

            ReadAttachment readAttachment = new ReadAttachment();
            readAttachment.byteBuffer = ByteBuffer.allocate((int) len);
            readAttachment.asynchronousChannel = fileChannel;

            CompletionHandler<Integer, ReadAttachment> completionHandler = new CompletionHandler<Integer, ReadAttachment>() {

                @Override
                public void completed(Integer result, ReadAttachment attachment) {

                    String content = new String(attachment.byteBuffer.array());
                    try {
                        attachment.asynchronousChannel.close();
                    } catch (IOException e) {
                        e.printStackTrace();
                    }
                    completeCallback.accept(content);
                }

                @Override
                public void failed(Throwable exc, ReadAttachment attachment) {
                    exc.printStackTrace();
                    exceptionError(errorCallback, completeCallback, String.format("error while reading file [%s]: %s", path, exc.getMessage()));
                }
            };

            fileChannel.read(
                    readAttachment.byteBuffer,
                    0,
                    readAttachment,
                    completionHandler);

Suppose that now, I don't want to allocate an entire ByteBuffer, but read line by line. I could use a ByteBuffer of fixed width and keep recalling read many times, always copying and appending to a StringBuffer until I don't get to a new line... My only concern is: because the encoding of the file that I am reading could be multi byte per character (UTF something), it may happen that the read bytes end with an uncomplete character. How can I make sure that I'm converting the right bytes into strings and not messing up the encoding?

UPDATE: answer is in the comment of the selected answer, but it basically points to CharsetDecoder.

like image 659
gotch4 Avatar asked Nov 27 '22 05:11

gotch4


1 Answers

If you have clear ASCII separator which you have in your case (\n), you'll not need to care about incomplete string as this character maps to singlebyte (and vice versa).

So just search for '\n' byte in your input and read and convert anything before into String. Loop until no more new lines are found. Then compact the buffer and reuse it for next read. If you don't find new line you'll have to allocate bigger buffer, copy the content of the old one and only then call the read again.

EDIT: As mentioned in the comment, you can pass the ByteBuffer to CharsetDecoder on the fly and translate it into CharBuffer (then append to StringBuilder or whatever is preffered solution).

like image 83
Zbynek Vyskovsky - kvr000 Avatar answered Dec 04 '22 14:12

Zbynek Vyskovsky - kvr000