I've implemented AES/CTR on Android using the built-in Cipher class. Decryption appears to be far too slow for my purposes, with a 128KB block taking approximately 6 seconds to decrypt on the emulator and 2.6 seconds on the Samsung Galaxy hardware.
I'm wondering if building OpenSSL using the NDK and calling its methods would be any faster. Does anyone have any experience with this? Part of me wants to believe that the Cipher( "AES/CTR/NoPadding" ) methods are just a wrapper around native OpenSSL calls anyway since the Linux OS backing Android should have libcrypto installed. If that were the case then trying to use the NDK would just be a waste of time as no performance gain could be expected.
I haven't bothered to time this on iOS but even 3Gs hardware decrypts so fast that a 10MB decryption appears to be instantaneous to the end user. I'm finding it difficult to believe that the Android implementation is really order-of-magnitudes worse but maybe that's the reality.
If this is really what I'm faced with does anyone have any ideas on other implementation strategies that would provide imperceptible response (on 10Mb files) for end users? Another developer in my office suggested in a tongue-in-cheek way that I just use XOR encryption which makes me want to facepalm myself but I think (security concerns aside) that if I did that it would work.
Thanks!
Here's some simplified code for reference:
public class ResourceDecryptor {
private static ThreadLocal<Cipher> mCipher;
private byte[] mIV = new byte[ 8 ];
private SecretKeySpec mKey;
private String mResourcePath;
private static final int kAESBlockSize = 16;
public ResourceDecryptor( String resourcePath, String decryptionKey ) throws UnsupportedOperationException {
// initialization of mKey, mIV, & mResourcePath, elided
// store mCipher as a thread local because Cipher.getInstance() is so slow,
// ResourceDecryptor is a static object that persists for the app lifetime
// so this leak is intentional and ok.
mCipher = new ThreadLocal<Cipher>() {
protected Cipher initialValue() {
try { return Cipher.getInstance( "AES/CTR/NoPadding" ); } catch ( Exception e ) { }
return null;
}
};
}
public ByteBuffer read( long offset, int length ) throws GeneralSecurityException, IOException {
Cipher cipher;
byte[] data, iv;
FileInputStream input;
int prefix, readLength;
input = null;
prefix = (int)( offset % kAESBlockSize );
readLength = ( prefix + length + kAESBlockSize - 1 ) / kAESBlockSize * kAESBlockSize;
data = new byte[ readLength ];
iv = new byte[ 16 ];
try {
input = new FileInputStream( mResourcePath );
input.skip( offset -= prefix );
if ( input.read( data ) != readLength ) throw new IOException( "I/O error: unable to read " + readLength + " bytes from offset " + offset );
System.arraycopy( mIV, 0, iv, 0, 8 );
offset /= kAESBlockSize;
iv[ 8 ] = (byte)( offset >> 56 & 0xff );
iv[ 9 ] = (byte)( offset >> 48 & 0xff );
iv[ 10 ] = (byte)( offset >> 40 & 0xff );
iv[ 11 ] = (byte)( offset >> 32 & 0xff );
iv[ 12 ] = (byte)( offset >> 24 & 0xff );
iv[ 13 ] = (byte)( offset >> 16 & 0xff );
iv[ 14 ] = (byte)( offset >> 8 & 0xff );
iv[ 15 ] = (byte)( offset & 0xff );
if ( ( cipher = mCipher.get() ) == null ) throw new GeneralSecurityException( "Unable to initialize Cipher( \"AES/CTR/NoPadding\" )" );
cipher.init( Cipher.DECRYPT_MODE, mKey, new IvParameterSpec( iv ) );
long startTime = System.currentTimeMillis();
data = cipher.doFinal( data );
System.out.println( "decryption of " + data.length + " bytes took " + ( ( System.currentTimeMillis() - startTime ) / 1000.0 ) + "s" );
// cipher.doFinal() takes 5.9s on Samsung Galaxy emulator for 128kb block
// cipher.doFinal() takes 2.6s on Samsung Galaxy hardware for 128kb block
} finally {
if ( input != null ) try { input.close(); } catch ( Exception e ) { }
}
// the default order of ByteBuffer is BIG_ENDIAN so it is unnecessary to explicitly set the order()
return ByteBuffer.wrap( data, prefix, length );
}
}
Yes, heavy lifting like that in a contained function is exactly where the NDK would shine. Keep in mind that Java is interpreted, and on pre-2.2 Android, there is no JIT, so every instruction is interpreted each time - that is a huge overhead.
Even with JIT, every array access does implicit bounds checking, so there is lots and lots of overhead.
If you write this function in C++, it will be significantly faster.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With