Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

problem using base64 encoder and InputStreamReader

I have some CLOB columns in a database that I need to put Base64 encoded binary files in. These files can be large, so I need to stream them, I can't read the whole thing in at once.

I'm using org.apache.commons.codec.binary.Base64InputStream to do the encoding, and I'm running into a problem. My code is essentially this

FileInputStream fis = new FileInputStream(file);
Base64InputStream b64is = new Base64InputStream(fis, true, -1, null);
BufferedReader reader = new BufferedReader(new InputStreamReader(b64is));

preparedStatement.setCharacterStream(1, reader);

When I run the above code, I get one of these during the execution of the update java.io.IOException: Underlying input stream returned zero bytes, it is thrown deep in the InputStreamReader code.

Why would this not work? It seems to me like the reader would attempt to read from the base 64 stream, which would read from the file stream, and everything should be happy.

like image 363
karoberts Avatar asked May 30 '10 01:05

karoberts


People also ask

Is Base64 encoder thread safe?

Encoder class are safe for use by multiple concurrent threads. Unless otherwise noted, passing a null argument to a method of this class will cause a NullPointerException to be thrown.

What is Base64 encoding good for?

Base64 encoding schemes are commonly used when there is a need to encode binary data that needs to be stored and transferred over media that are designed to deal with ASCII. This is to ensure that the data remain intact without modification during transport.

How does Base64 encoding and decoding work?

Base64 encoding converts every three bytes of data (three bytes is 3*8=24 bits) into four base64 characters. Which makes up a total of three bytes (24 bits). Base64 encoding will divide the binary data up to six-bit chunks and map them to a base64 character according to the table above.

Can you Base64 encode anything?

Base64 transforms any input into an encoding that only uses alphanumeric characters, + , / and the = as a padding character. Save this answer. Show activity on this post. Base64 is a binary to a text encoding scheme that represents binary data in an ASCII string format.


2 Answers

This appears to be a bug in Base64InputStream. You're calling it correctly.

You should report this to the Apache commons codec project.

Simple test case:

import java.io.*;
import org.apache.commons.codec.binary.Base64InputStream;

class tmp {
  public static void main(String[] args) throws IOException {
    FileInputStream fis = new FileInputStream(args[0]);
    Base64InputStream b64is = new Base64InputStream(fis, true, -1, null);

    while (true) {
      byte[] c = new byte[1024];
      int n = b64is.read(c);
      if (n < 0) break;
      if (n == 0) throw new IOException("returned 0!");
      for (int i = 0; i < n; i++) {
        System.out.print((char)c[i]);
      }
    }
  }
}

the read(byte[]) call of InputStream is not allowed to return 0. It does return 0 on any file which is a multiple of 3 bytes long.

like image 172
Keith Randall Avatar answered Nov 08 '22 20:11

Keith Randall


Interesting, I did some tests here and it indeed throws that exception when you read the Base64InputStream using an InputStreamReader, regardless the source of the stream, but it works flawlessly when you read it as binary stream. As Trashgod mentioned, Base64 encoding is framed. The InputStreamReader should in fact have invoked flush() on the Base64InputStream once more to see if it doesn't return any more data.

I don't see other ways to fix this than implementing your own Base64InputStreamReader or Base64Reader. This is actually a bug, see Keith's answer.

As a workaround you can also just store it in a BLOB instead of a CLOB in the DB and use PreparedStatement#setBinaryStream() instead. It doesn't matter if it's stored as binary data or not. You don't want to have such large Base64 data to be indexable or searchable anyway.


Update: since that's not an option and having the Apache Commons Codec guys to fix the Base64InputStream bug which I repored as CODEC-101 might take some time, you may consider to use another 3rd party Base64 API. I've found one here (public domain, so you can do whatever with it you want, even place in your own package), I've tested it here and it works fine.

InputStream base64 = new Base64.InputStream(input, Base64.ENCODE);

Update 2: the commons codec guy has fixed it pretty soon.

Index: src/java/org/apache/commons/codec/binary/Base64InputStream.java
===================================================================
--- src/java/org/apache/commons/codec/binary/Base64InputStream.java (revision 950817)
+++ src/java/org/apache/commons/codec/binary/Base64InputStream.java (working copy)
@@ -145,21 +145,41 @@
         } else if (len == 0) {
             return 0;
         } else {
-            if (!base64.hasData()) {
-                byte[] buf = new byte[doEncode ? 4096 : 8192];
-                int c = in.read(buf);
-                // A little optimization to avoid System.arraycopy()
-                // when possible.
-                if (c > 0 && b.length == len) {
-                    base64.setInitialBuffer(b, offset, len);
+            int readLen = 0;
+            /*
+             Rationale for while-loop on (readLen == 0):
+             -----
+             Base64.readResults() usually returns > 0 or EOF (-1).  In the
+             rare case where it returns 0, we just keep trying.
+
+             This is essentially an undocumented contract for InputStream
+             implementors that want their code to work properly with
+             java.io.InputStreamReader, since the latter hates it when
+             InputStream.read(byte[]) returns a zero.  Unfortunately our
+             readResults() call must return 0 if a large amount of the data
+             being decoded was non-base64, so this while-loop enables proper
+             interop with InputStreamReader for that scenario.
+             -----
+             This is a fix for CODEC-101
+            */
+            while (readLen == 0) {
+                if (!base64.hasData()) {
+                    byte[] buf = new byte[doEncode ? 4096 : 8192];
+                    int c = in.read(buf);
+                    // A little optimization to avoid System.arraycopy()
+                    // when possible.
+                    if (c > 0 && b.length == len) {
+                        base64.setInitialBuffer(b, offset, len);
+                    }
+                    if (doEncode) {
+                        base64.encode(buf, 0, c);
+                    } else {
+                        base64.decode(buf, 0, c);
+                    }
                 }
-                if (doEncode) {
-                    base64.encode(buf, 0, c);
-                } else {
-                    base64.decode(buf, 0, c);
-                }
+                readLen = base64.readResults(b, offset, len);
             }
-            return base64.readResults(b, offset, len);
+            return readLen;
         }
     }

I tried it here and it works fine.

like image 39
BalusC Avatar answered Nov 08 '22 20:11

BalusC