Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Go Memory Usage with cipher.AEAD.Seal()

I am encrypting data using Go's implementation of ChaCha20-Poly1305, but as I am encrypting some large files the memory usage is higher than I had expected. I understand that Go's implementation of AEAD ciphers means we have to hold the entire data in memory for it to create the hash, but the memory usage is double the size of the plaintext.

The following small program which attempts to encrypt 4 GiB of data highlights this (the key and nonce shouldn't be empty in a real world program):

package main

import (
  "os"
  "fmt"
  "runtime"
  "golang.org/x/crypto/chacha20poly1305"
)

func main() {
  showMemUsage("START")

  plaintext := make([]byte, 4 * 1024 * 1024 * 1024) // 4 GiB

  showMemUsage("STAGE 1")

  key := make([]byte, chacha20poly1305.KeySize)
  if cipher, err := chacha20poly1305.New(key); err == nil {
    showMemUsage("STAGE 2")

    nonce := make([]byte, chacha20poly1305.NonceSize)
    cipher.Seal(plaintext[:0], nonce, plaintext, nil)
  }

  showMemUsage("END")
}

func showMemUsage(tag string) {
  var m runtime.MemStats

  runtime.ReadMemStats(&m)
  fmt.Fprintf(os.Stdout, "[%s] Alloc = %v MiB, TotalAlloc = %v MiB\n", tag, m.Alloc / 1024 / 1024, m.TotalAlloc / 1024 / 1024)
}

According to the source code of crypto/cipher/gcm.go (which is used by both AES-GCM and ChaCha20-Poly1305) there is the following comment:

// To reuse plaintext's storage for the encrypted output, use plaintext[:0]
// as dst. Otherwise, the remaining capacity of dst must not overlap plaintext.
Seal(dst, nonce, plaintext, additionalData []byte) []byte

This implies I should be able to re-use the memory, which I have attempted to do, but it makes no difference to the amount of memory that my application is using - after calling Seal() we always end up using 8 GiB of memory to encrypt 4 GiB of data?

[START] Alloc = 0 MiB, TotalAlloc = 0 MiB
[STAGE 1] Alloc = 4096 MiB, TotalAlloc = 4096 MiB
[STAGE 2] Alloc = 4096 MiB, TotalAlloc = 4096 MiB
[END] Alloc = 8192 MiB, TotalAlloc = 8192 MiB

If it was re-using the memory (as implied) then I shouldn't expect any massive increase except that AEAD ciphers add a relatively small hash to the ciphertext?

like image 872
chrixm Avatar asked Dec 29 '25 06:12

chrixm


2 Answers

You forgot to account for the authentication tag that is appended to the ciphertext. If you make room for it in the initial allocation, no further allocations are required:

package main

import (
        "fmt"
        "os"
        "runtime"

        "golang.org/x/crypto/chacha20poly1305"
)

func main() {
        showMemUsage("START")

        plaintext := make([]byte, 4<<30, 4<<30+chacha20poly1305.Overhead)

        showMemUsage("STAGE 1")

        key := make([]byte, chacha20poly1305.KeySize)
        if cipher, err := chacha20poly1305.New(key); err == nil {
                showMemUsage("STAGE 2")

                nonce := make([]byte, chacha20poly1305.NonceSize)
                cipher.Seal(plaintext[:0], nonce, plaintext, nil)
        }

        showMemUsage("END")
}

func showMemUsage(tag string) {
        var m runtime.MemStats

        runtime.ReadMemStats(&m)
        fmt.Fprintf(os.Stdout, "[%s] Alloc = %v MiB, TotalAlloc = %v MiB\n", tag, m.Alloc>>20, m.TotalAlloc>>20)
}

// Output:
// [START] Alloc = 0 MiB, TotalAlloc = 0 MiB
// [STAGE 1] Alloc = 4096 MiB, TotalAlloc = 4096 MiB
// [STAGE 2] Alloc = 4096 MiB, TotalAlloc = 4096 MiB
// [END] Alloc = 4096 MiB, TotalAlloc = 4096 MiB

Overhead is the size of the Poly1305 authentication tag, and the difference between a ciphertext length and its plaintext.

like image 152
Peter Avatar answered Dec 30 '25 19:12

Peter


Yes, this problem exists for many libraries that implement AEAD ciphers unfortunately. One shot implementations are quite common. There is something to be said, as it protects the user against the use of unverified data retrieved during decryption. This protection is a bit of a problem if the data is stored to disk; in that case the data isn't directly used and the temp file could be destroyed if the verification tag doesn't match.

It might may be considered a problem if the data will be deserialized to objects as well. Developers should however beware that deserialization of unverified (randomized) data could lead to vulnerabilities getting exposed.

This can be solved in two ways:

  1. find a better implementation that does allow streaming;
  2. alter the protocol.

One such alteration is the splitting of the ciphertext into chunks of a size that comfortably does fit into memory. Then these chunks can be verified individually. That does leave room for adding, removing or shuffling the chunks around though. That can be avoided by adding an additional authentication tag over all the other authentication tags of said chunks.

It would be a good idea to first check the validity of the authentication tags before handling the data within the chunks. Those authentication tags could be stored at the start or end of the ciphertext instead of at the end of each chunk to avoid having to pass over the ciphertext twice (which would likely increase the amount of IO ops).

At this point using a stream cipher such as ChaCha20 with HMAC could be more efficient and yes. Don't forget to include the IV in the HMAC calculation in case HMAC is used instead of a pre-made AEAD construction.


Regardless of your scheme, please explicitly document it and include a version number with your ciphertext; you may want to upgrade to another protocol...

like image 33
Maarten Bodewes Avatar answered Dec 30 '25 20:12

Maarten Bodewes