I have an encryption algorithm (AES) that accepts a file converted to array byte and encrypt it. Since I am going to process a very large files, the JVM may go out of memory. I am planing to read the files in multiple byte arrays, each containing some part of the file. Then I iteratively feed the algorithm. Finally, I merge them to produce an encrypted file.
So my question is: Is there any way to read a file part by part to multiple byte arrays?
I thought I could use the following to read the file to a byte array:
IOUtils.toByteArray(InputStream input).
And then split the array into multiple bytes using:
Arrays.copyOfRange()
But I am afraid that the code that reads a file to ByteArray
will make the JVM to go out of memory.
Look up cipher streams in Java. You can use them to encrypt/decrypt streams on the fly so you don't have to store the whole thing in memory. All you have to do is copy the regular FileInputStream
for your source file to the CipherOutputStream
that's wrapping your FileOutputStream
for the encrypted sink file. IOUtils
even conveniently contains a copy(InputStream, OutputStream)
method to do this copy for you.
For example:
public static void main(String[] args) {
encryptFile("exampleInput.txt", "exampleOutput.txt");
}
public static void encryptFile(String source, String sink) {
FileInputStream fis = null;
try {
fis = new FileInputStream(source);
CipherOutputStream cos = null;
try {
cos = new CipherOutputStream(new FileOutputStream(sink), getEncryptionCipher());
IOUtils.copy(fis, cos);
} finally {
if (cos != null)
cos.close();
}
} finally {
if (fis != null)
fis.close();
}
}
private static Cipher getEncryptionCipher() {
// Create AES cipher with whatever padding and other properties you want
Cipher cipher = ... ;
// Create AES secret key
Key key = ... ;
cipher.init(Cipher.ENCRYPT_MODE, key);
}
If you need to know the number of bytes that were copied, you can use IOUtils.copyLarge
instead of IOUtils.copy
if the file sizes exceed Integer.MAX_VALUE
bytes (2 GB).
To decrypt the file, do the same thing, but use CipherInputStream
instead ofCipherOutputStream
and initialize your Cipher
using Cipher.DECRYPT_MODE
.
Take a look here for more info on cipher streams in Java.
This will save you space because you won't need to store byte
arrays of your own anymore. The only stored byte[]
in this system is the internal byte[]
of the Cipher
, which will get cleared each time enough input is entered and an encrypted block is returned by Cipher.update
, or on Cipher.doFinal
when the CipherOutputStream
is closed. However, you don't have to worry about any of this since it's all internal and everything is managed for you.
Edit: note that this can result in certain encryption exceptions being ignored, particularly BadPaddingException
and IllegalBlockSizeException
. This behavior can be found in the CipherOutputStream
source code. (Granted, this source is from the OpenJDK, but it probably does the same thing in the Sun JDK.) Also, from the CipherOutputStream javadocs:
This class adheres strictly to the semantics, especially the failure semantics, of its ancestor classes
java.io.OutputStream
andjava.io.FilterOutputStream
. This class has exactly those methods specified in its ancestor classes, and overrides them all. Moreover, this class catches all exceptions that are not thrown by its ancestor classes.
The bolded line here implies that the cryptographic exceptions are ignored, which they are. This may cause some unexpected behavior while trying to read an encrypted file, especially for block and/or padding encryption algorithms like AES. Make a mental note of this that you will get zero or partial output for the encrypted (or decrypted for CipherInputStream
) file.
If you're using IOUtils
, perhaps you should consider IOUtils.copyLarge()
public static long copyLarge(InputStream input,
OutputStream output,
long inputOffset,
long length)
and specify a ByteArrayOutputStream as the output. You can then iterate through and load sections of your file using offset/length.
From the doc:
Copy some or all bytes from a large (over 2GB) InputStream to an OutputStream, optionally skipping input bytes.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With