Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Can Haskell or OCAML handle sensitive data without leaking via garbage collection?

I would do something like this (pseudo code):

1. load sensitive encrypted data from file
2. decrypt the data
3. do something with the unencrypted data
4. override the data safely / securely (for example with random data)

The time that the sensitive data lies plain (unencrypted) in memory should be as short as possible.

The plain data must not be leaked in any way.

A. Can such a program be written in Haskell or OCAML?

B. Can it be prevented that the data gets leaked, i.e. by being silently copied in the background by the garbage collector?

C. Can the plain data be properly overridden in memory?

As far as I know garbage collectors (GCs) can make copies of data silently in the background. I guess that is done by generational GC algorithms, but I don't know for sure.

I know that it still would be possible for an attacker to get the plain data if the attacker manages to get the memory of the program at the right time / state. I just consider to do that to raise security because I do not have the context (i.e. OS, swapping etc.) under control.

like image 340
user573215 Avatar asked Jun 02 '20 09:06

user573215


People also ask

Should I learn Haskell or OCaml?

Where Haskell is doing more-or-less fine with libraries, OCaml has significantly less to propose. There are some nice libraries in OCaml, but in many areas the situation is not perfect. For example, you would expect a modern Unicode-aware string type in your language of choice.

What is an example of sensitive data leak?

Hardcoded secrets are an example of a sensitive data leak. Sensitive data leaks happen when an application exposes sensitive data, such as credentials, secret keys, personal information, or configuration information, to people who shouldn’t have access to that information.

What makes Haskell so special?

Haskell is pure and lazy, that’s a big game changer. Purity is imposed by the type system and it is one of the main features that gives Haskell its unique feel. Once you approach language design this bravely, doing away with the old ways of writing programs, you get a very different language in the end.

Why isn't OCaml more popular?

Because OCaml is impure, you can do effects anywhere you want. One consequence of this is that monads, although not unknown of in the OCaml world, are less popular and the support for monads is way worse. There is no do -notation and no proven ways to combine monadic layers, athough it looks like it’s about to change.


2 Answers

I already mentioned this in a comment, but I think it is a really good question and deserves an answer.

There is already a data type ScrubbedBytes that has the following properties:

  • It is allocated with newAlignedPinnedByteArray#, which means while the newly allocated MutableByteArray# is referenced anywhere in your code it will not be GCed, but it will also not going to get moved or copied around.
  • Upon allocation a weak pointer is created with mkWeak# and a finalizer gets added to the newly allocated memory. This means that whenever scrubbed bytes are no longer referenced in your code and before GC deallocates the memory a scrubber will get invoked that will write zeros into the memory.
  • Equality will not short circuit, thus guarding against timing attacks.

There is one small gotcha to this scrubber. There is a small chance that it will not get executed, in particular if a program exits right before the GC should cleanup the memory. (See more info on weak pointers.) Therefore, I would recommend implementing it using bracket pattern. Here is how you can get it done with primitive package:

import Control.Exception
import Control.Monad.Primitive (RealWorld)
import qualified Data.Primitive.ByteArray as BA

withScrubbedMutableByteArray ::
     Int -- ^ Number of bytes
  -> (BA.MutableByteArray RealWorld -> IO a)
  -- ^ Action to execute
  -> IO a
withScrubbedMutableByteArray c f = do
  mba <- BA.newPinnedByteArray c
  f mba `finally` BA.setByteArray mba 0 c (0 :: Word8)

Reason why using finally is safer is because you will have stronger guarantees that the memory will be zeroed out. For example user hitting Ctrl-C in a correct setup will not prevent scrubber from running.

like image 154
lehins Avatar answered Oct 31 '22 05:10

lehins


In OCaml it can be easily done using Bigarrays which are not governed by GC, never copied, and never examined by it. You can use Unix.map_file to load it and ocaml-struct to handle the loaded data nicely (if it is structured). OCaml is used extensively for writing low-level security-related code, see the mirage project (it has tons of cryptographic-related libraries), ocaml-tls a pure implementation of the TLS protocol in OCaml, and Project Everest which uses OCaml as the target language.

When decrypting/encrypting and otherwise processing the secret data you should be careful and do not put it in a boxed type, including strings and int64 integers. If you will take a look at mirage-crypto you will find out that all algorithms are implemented using integers only, which are represented as immediates and are never touched by GC.

like image 31
ivg Avatar answered Oct 31 '22 05:10

ivg