I am encrypting data using Go's implementation of ChaCha20-Poly1305, but as I am encrypting some large files the memory usage is higher than I had expected. I understand that Go's implementation of AEAD ciphers means we have to hold the entire data in memory for it to create the hash, but the memory usage is double the size of the plaintext.
The following small program which attempts to encrypt 4 GiB of data highlights this (the key and nonce shouldn't be empty in a real world program):
package main
import (
"os"
"fmt"
"runtime"
"golang.org/x/crypto/chacha20poly1305"
)
func main() {
showMemUsage("START")
plaintext := make([]byte, 4 * 1024 * 1024 * 1024) // 4 GiB
showMemUsage("STAGE 1")
key := make([]byte, chacha20poly1305.KeySize)
if cipher, err := chacha20poly1305.New(key); err == nil {
showMemUsage("STAGE 2")
nonce := make([]byte, chacha20poly1305.NonceSize)
cipher.Seal(plaintext[:0], nonce, plaintext, nil)
}
showMemUsage("END")
}
func showMemUsage(tag string) {
var m runtime.MemStats
runtime.ReadMemStats(&m)
fmt.Fprintf(os.Stdout, "[%s] Alloc = %v MiB, TotalAlloc = %v MiB\n", tag, m.Alloc / 1024 / 1024, m.TotalAlloc / 1024 / 1024)
}
According to the source code of crypto/cipher/gcm.go (which is used by both AES-GCM and ChaCha20-Poly1305) there is the following comment:
// To reuse plaintext's storage for the encrypted output, use plaintext[:0]
// as dst. Otherwise, the remaining capacity of dst must not overlap plaintext.
Seal(dst, nonce, plaintext, additionalData []byte) []byte
This implies I should be able to re-use the memory, which I have attempted to do, but it makes no difference to the amount of memory that my application is using - after calling Seal() we always end up using 8 GiB of memory to encrypt 4 GiB of data?
[START] Alloc = 0 MiB, TotalAlloc = 0 MiB
[STAGE 1] Alloc = 4096 MiB, TotalAlloc = 4096 MiB
[STAGE 2] Alloc = 4096 MiB, TotalAlloc = 4096 MiB
[END] Alloc = 8192 MiB, TotalAlloc = 8192 MiB
If it was re-using the memory (as implied) then I shouldn't expect any massive increase except that AEAD ciphers add a relatively small hash to the ciphertext?
You forgot to account for the authentication tag that is appended to the ciphertext. If you make room for it in the initial allocation, no further allocations are required:
package main
import (
"fmt"
"os"
"runtime"
"golang.org/x/crypto/chacha20poly1305"
)
func main() {
showMemUsage("START")
plaintext := make([]byte, 4<<30, 4<<30+chacha20poly1305.Overhead)
showMemUsage("STAGE 1")
key := make([]byte, chacha20poly1305.KeySize)
if cipher, err := chacha20poly1305.New(key); err == nil {
showMemUsage("STAGE 2")
nonce := make([]byte, chacha20poly1305.NonceSize)
cipher.Seal(plaintext[:0], nonce, plaintext, nil)
}
showMemUsage("END")
}
func showMemUsage(tag string) {
var m runtime.MemStats
runtime.ReadMemStats(&m)
fmt.Fprintf(os.Stdout, "[%s] Alloc = %v MiB, TotalAlloc = %v MiB\n", tag, m.Alloc>>20, m.TotalAlloc>>20)
}
// Output:
// [START] Alloc = 0 MiB, TotalAlloc = 0 MiB
// [STAGE 1] Alloc = 4096 MiB, TotalAlloc = 4096 MiB
// [STAGE 2] Alloc = 4096 MiB, TotalAlloc = 4096 MiB
// [END] Alloc = 4096 MiB, TotalAlloc = 4096 MiB
Overhead is the size of the Poly1305 authentication tag, and the difference between a ciphertext length and its plaintext.
Yes, this problem exists for many libraries that implement AEAD ciphers unfortunately. One shot implementations are quite common. There is something to be said, as it protects the user against the use of unverified data retrieved during decryption. This protection is a bit of a problem if the data is stored to disk; in that case the data isn't directly used and the temp file could be destroyed if the verification tag doesn't match.
It might may be considered a problem if the data will be deserialized to objects as well. Developers should however beware that deserialization of unverified (randomized) data could lead to vulnerabilities getting exposed.
This can be solved in two ways:
One such alteration is the splitting of the ciphertext into chunks of a size that comfortably does fit into memory. Then these chunks can be verified individually. That does leave room for adding, removing or shuffling the chunks around though. That can be avoided by adding an additional authentication tag over all the other authentication tags of said chunks.
It would be a good idea to first check the validity of the authentication tags before handling the data within the chunks. Those authentication tags could be stored at the start or end of the ciphertext instead of at the end of each chunk to avoid having to pass over the ciphertext twice (which would likely increase the amount of IO ops).
At this point using a stream cipher such as ChaCha20 with HMAC could be more efficient and yes. Don't forget to include the IV in the HMAC calculation in case HMAC is used instead of a pre-made AEAD construction.
Regardless of your scheme, please explicitly document it and include a version number with your ciphertext; you may want to upgrade to another protocol...
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With