When using AES encryption, plaintext must be padded to the cipher block size. Most libraries and standards use padding where the padding bytes can be determined from the unpadded plaintext length. Is there a benefit to using random padding bytes when possible?
I'm implementing a scheme for storing sensitive per-user and per-session data. The data will usually be JSON-encoded key-value pairs, and can be potentially short and repetitive. I'm looking to PKCS#5 for guidance, but I planned on using AES for the encryption algorithm rather than DES3. I was planning on a random IV per data item, and a key determined by the user ID and password or a session ID, as appropriate.
One thing that surprised me is the PKCS#5 padding scheme for the plaintext. To pad the ciphertext to 8-byte blocks, 1 to 8 bytes are added at the end, with the padding byte content reflecting the number of padding bytes (i.e. 01
, 0202
, 030303
, up to 0808080808080808
). My own padding scheme was to use random bytes at the front of the plaintext, and the last character of the plaintext would be the number of padding bytes added.
My reasoning was that in AES-CBC mode, each block is a function of the ciphertext of the preceding block. This way, each plaintext would have an element of randomness, giving me another layer of protection from known plaintext attacks, as well as IV and key issues. Since my plaintext is expected to be short, I don't mind holding the whole decrypted string in memory, and slicing padding off the front and back.
One drawback would be the same unpadded plaintext, IV, and key would result in different ciphertext, making unit testing difficult (but not impossible - I can use a pseudo-random padding generator for testing, and a cryptographically strong one for production).
Another would be that, to enforce random padding, I'd have to add a minimum of two bytes - a count and one random byte. For deterministic padding, the minimum is one byte, either stored with the plaintext or in the ciphertext wrapper.
Since a well-regarded standard like PKCS#5 decided to use deterministic padding, I'm wondering if there is something else I missed, or I'm judging the benefits too high.
Both, I suspect. The benefit is fairly minimal.
You have forgotten about the runtime cost of acquiring or generating cryptographic-quality random numbers. at one extreme, when a finite supply of randomness is available (/dev/random on some systems for instance), your code may have to wait a long time for more random bytes.
At the other extreme, when you are getting your random bytes from a PRNG, you could expose yourself to problems if you're using the same random source to generate your keys. If you're sending encrypted data to multiple recipients one after another, you have given the previous recipient a whole bunch of information about the state of the PRNG which will be used to pick the key for your next comms session. If your PRNG algorithm is ever broken, which is IMO more likely than a good plaintext attack on full AES, you're much worse off than if you had used deliberately-deterministic padding.
In either case, however you get the padding, it's more computationally intensive than PKCS#5 padding.
As an aside, it is fairly standard to compress potentially-repetitive data with e.g. deflate before encrypting it; this reduces the redundancy in the data, which can make certain attacks more difficult to perform.
One last recommendation: deriving the key with a mechanism in which only the username and password vary is very dangerous. If you are going to use it, make sure you use a Hash algorithm with no known flaws (not SHA-1, not MD-5). cf this slashdot story
Hope this helps.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With