Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Read a file to multiple byte arrays

Tags:

java

file

file-io

I have an encryption algorithm (AES) that accepts a file converted to array byte and encrypt it. Since I am going to process a very large files, the JVM may go out of memory. I am planing to read the files in multiple byte arrays, each containing some part of the file. Then I iteratively feed the algorithm. Finally, I merge them to produce an encrypted file.

So my question is: Is there any way to read a file part by part to multiple byte arrays?

I thought I could use the following to read the file to a byte array:

    IOUtils.toByteArray(InputStream input).

And then split the array into multiple bytes using:

    Arrays.copyOfRange()

But I am afraid that the code that reads a file to ByteArray will make the JVM to go out of memory.

like image 288
LeTex Avatar asked Nov 07 '12 16:11

LeTex


2 Answers

Look up cipher streams in Java. You can use them to encrypt/decrypt streams on the fly so you don't have to store the whole thing in memory. All you have to do is copy the regular FileInputStream for your source file to the CipherOutputStream that's wrapping your FileOutputStream for the encrypted sink file. IOUtils even conveniently contains a copy(InputStream, OutputStream) method to do this copy for you.

For example:

public static void main(String[] args) {
    encryptFile("exampleInput.txt", "exampleOutput.txt");
}

public static void encryptFile(String source, String sink) {
    FileInputStream fis = null;
    try {
        fis = new FileInputStream(source);
        CipherOutputStream cos = null;
        try {
            cos = new CipherOutputStream(new FileOutputStream(sink), getEncryptionCipher());
            IOUtils.copy(fis, cos);
        } finally {
            if (cos != null)
                cos.close();
        }
    } finally {
        if (fis != null)
            fis.close();
    }
}

private static Cipher getEncryptionCipher() {
    // Create AES cipher with whatever padding and other properties you want
    Cipher cipher = ... ;
    // Create AES secret key
    Key key = ... ;
    cipher.init(Cipher.ENCRYPT_MODE, key);
}

If you need to know the number of bytes that were copied, you can use IOUtils.copyLarge instead of IOUtils.copy if the file sizes exceed Integer.MAX_VALUE bytes (2 GB).

To decrypt the file, do the same thing, but use CipherInputStream instead ofCipherOutputStream and initialize your Cipher using Cipher.DECRYPT_MODE.

Take a look here for more info on cipher streams in Java.

This will save you space because you won't need to store byte arrays of your own anymore. The only stored byte[] in this system is the internal byte[] of the Cipher, which will get cleared each time enough input is entered and an encrypted block is returned by Cipher.update, or on Cipher.doFinal when the CipherOutputStream is closed. However, you don't have to worry about any of this since it's all internal and everything is managed for you.

Edit: note that this can result in certain encryption exceptions being ignored, particularly BadPaddingException and IllegalBlockSizeException. This behavior can be found in the CipherOutputStream source code. (Granted, this source is from the OpenJDK, but it probably does the same thing in the Sun JDK.) Also, from the CipherOutputStream javadocs:

This class adheres strictly to the semantics, especially the failure semantics, of its ancestor classes java.io.OutputStream and java.io.FilterOutputStream. This class has exactly those methods specified in its ancestor classes, and overrides them all. Moreover, this class catches all exceptions that are not thrown by its ancestor classes.

The bolded line here implies that the cryptographic exceptions are ignored, which they are. This may cause some unexpected behavior while trying to read an encrypted file, especially for block and/or padding encryption algorithms like AES. Make a mental note of this that you will get zero or partial output for the encrypted (or decrypted for CipherInputStream) file.

like image 92
Brian Avatar answered Nov 20 '22 07:11

Brian


If you're using IOUtils, perhaps you should consider IOUtils.copyLarge()

public static long copyLarge(InputStream input,
                             OutputStream output,
                             long inputOffset,
                             long length)

and specify a ByteArrayOutputStream as the output. You can then iterate through and load sections of your file using offset/length.

From the doc:

Copy some or all bytes from a large (over 2GB) InputStream to an OutputStream, optionally skipping input bytes.

like image 1
Brian Agnew Avatar answered Nov 20 '22 07:11

Brian Agnew