Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do you encrypt large files / byte streams in Go?

I have some large files I would like to AES encrypt before sending over the wire or saving to disk. While it seems possible to encrypt streams, there seems to be warnings against doing this and instead people recommend splitting the files into chunks and using GCM or crypto/nacl/secretbox.

Processing streams of data is more difficult due to the authenticity requirement. We can’t encrypt-then-MAC: by it’s nature, we usually don’t know the size of a stream. We can’t send the MAC after the stream is complete, as that usually is indicated by the stream being closed. We can’t decrypt a stream on the fly, because we have to see the entire ciphertext in order to check the MAC. Attempting to secure a stream adds enormous complexity to the problem, with no good answers. The solution is to break the stream into discrete chunks, and treat them as messages.

  • https://leanpub.com/gocrypto/read

Files are segmented into 4KiB blocks. Each block gets a fresh random 128 bit IV each time it is modified. A 128-bit authentication tag (GHASH) protects each block from modifications.

  • https://nuetzlich.net/gocryptfs/forward_mode_crypto/

If a large amount of data is decrypted it is not always possible to buffer all decrypted data until the authentication tag is verified. Splitting the data into small chunks fixes the problem of deferred authentication checks but introduces a new one. The chunks can be reordered... ...because every chunk is encrypted separately. Therefore the order of the chunks must be encoded somehow into the chunks itself to be able to detect rearranging any number of chunks.

  • https://github.com/minio/sio

Can anyone with actual cryptography experience point me in the right direction?

Update

I realized after asking this question that there is a difference between simply not being able to fit the whole byte stream into memory (encrypting a 10GB file) and the byte stream also being an unknown length that could continue long past the need for the stream's start to be decoded (an 24-hour live video stream).

I am mostly interested in large blobs where the end of the stream can be reached before the beginning needs to be decoded. In other words, encryption that does not require the whole plaintext/ciphertext to be loaded into memory at the same time.

like image 481
Xeoncross Avatar asked Mar 29 '18 01:03

Xeoncross


People also ask

Which encryption is best for large files?

AES uses simple algebraic calculations, and every block of data is always encrypted the same way, which makes it ideal for encrypting large files.

How do I encrypt a file in Golang?

You can do this with a random value, using the package crypto/rand . To encrypt the data, we use the function Seal . It will encrypt the file using the GCM mode, appending the nonce and tag (MAC value) to the final data, so we can use it to decrypt it later.

How do I encrypt and decrypt a string in Golang?

By using the Base64 encoding, we can now encode and decode strings. We then follow with the main() function that has the StringToEncode variable, which is the string we are encrypting. Afterward, we call the methods that come with the Base64 package and pass the variable created that needs encoding.


Video Answer


1 Answers

As you've already discovered from your research, there isn't much of an elegant solution for authenticated encryption of large files.

There are traditionally two ways to approach this problem:

  • Split the file into chunks, encrypt each chunk individually and let each chunk have its own authentication tag. AES-GCM would be the best mode to use for this. This method causes file size bloating proportionate to the size of the file. You'll also need a unique nonce for each chunk. You also need a way to indicate where chunks begin/end.

  • Encrypt using AES-CTR with a buffer, call Hash.Write on an HMAC for each buffer of encrypted data. The benefit of this is that encrypting can be done in one pass. The downside is that decryption requires one pass to validate the HMAC and then another pass to actually decrypt. The upside here is that the file size remains the same, plus roughly ~48 or so bytes for the IV and HMAC result.

Neither is ideal, but for very large files (~2GB or more), the second option is probably preferred.

I have included an example of encryption in Go using the second method below. In this scenario, the last 48 bytes are the IV (16 bytes) and the result of the HMAC (32 bytes). Note the HMACing of the IV also.

const BUFFER_SIZE int = 4096
const IV_SIZE int = 16

func encrypt(filePathIn, filePathOut string, keyAes, keyHmac []byte) error {
    inFile, err := os.Open(filePathIn)
    if err != nil { return err }
    defer inFile.Close()

    outFile, err := os.Create(filePathOut)
    if err != nil { return err }
    defer outFile.Close()

    iv := make([]byte, IV_SIZE)
    _, err = rand.Read(iv)
    if err != nil { return err }

    aes, err := aes.NewCipher(keyAes)
    if err != nil { return err }

    ctr := cipher.NewCTR(aes, iv)
    hmac := hmac.New(sha256.New, keyHmac)

    buf := make([]byte, BUFFER_SIZE)
    for {
        n, err := inFile.Read(buf)
        if err != nil && err != io.EOF { return err }

        outBuf := make([]byte, n)
        ctr.XORKeyStream(outBuf, buf[:n])
        hmac.Write(outBuf)
        outFile.Write(outBuf)

        if err == io.EOF { break }
    }

    outFile.Write(iv)
    hmac.Write(iv)
    outFile.Write(hmac.Sum(nil))

    return nil
}
like image 174
Luke Joshua Park Avatar answered Sep 21 '22 03:09

Luke Joshua Park