Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Securing data in the google app engine datastore

Our google app engine app stores a fair amount of personally identifying information (email, ssn, etc) to identify users. I'm looking for advice as to how to secure that data.

My current strategy

Store the sensitive data in two forms:

  • Hashed - using SHA-2 and a salt
  • Encrypted - using public/private key RSA

When we need to do look ups:

  • Do look-ups on the hashed data (hash the PII in a query, compare it to the hashed PII in the datastore).

If we ever need to re-hash the data or otherwise deal with it in a raw form:

  • Decrypt the encrypted version with our private key. Never store it in raw form, just process it then re-hash & re-encrypt it.

My concerns

Keeping our hash salt secret

If an attacker gets ahold of the data in the datastore, as well as our hash salt, I'm worried they could brute force the sensitive data. Some of it (like SSN, a 9-digit number) does not have a big key space, so even with a modern hash algorithm I believe it could be done if the attacker knew the salt.

My current idea is to keep the salt out of source control and in it's own file. That file gets loaded on to GAE during deployment and the app reads the file when it needs to hash incoming data.

In between deployments the salt file lives on a USB key protected by an angry bear (or a safe deposit box).

With the salt only living in two places

  1. The USB key
  2. Deployed to google apps

and with code download permanently disabled, I can't think of a way for someone to get ahold of the salt without stealing that USB key. Am I missing something?

Keeping our private RSA key secret

Less worried about this. It will be rare that we'll need to decrypt the encrypted version (only if we change the hash algorithm or data format).

The private key never has to touch the GAE server, we can pull down the encrypted data, decrypt it locally, process it, and re-upload the encrypted / hashed versions.

We can keep our RSA private key on a USB stick guarded by a bear AND a tiger, and only bring it out when we need it.


I realize this question isn't exactly google apps specific, but I think GAE makes the situation somewhat unique.

If I had total control, I'd do things like lock down deployment access and access to the datastore viewer with two-factor authentication, but those options aren't available at the moment (Having a GAE specific password is good, but I like having RSA tokens involved).

I'm also neither a GAE expert nor a security expert, so if there's a hole I'm missing or something I'm not thinking of specific to the platform, I would love to hear it.

like image 228
Rob Boyle Avatar asked Apr 10 '12 20:04

Rob Boyle


People also ask

What is unusual about Google's App Engine Datastore?

Unlike traditional relational databases which enforce a schema, Datastore is schemaless. It doesn't require entities of the same kind to have a consistent set of properties (although you can choose to enforce such a requirement in your own application code).

What are the different ways of storing application data in Google App Engine?

To store data and files on App Engine, you can use Google Cloud services or any other storage service that is supported by your language and is accessible from your App Engine instance. Third-party databases can be hosted on another cloud provider, hosted on premises, or managed by a third-party vendor.

What type of encryption does Datastore use to encrypt each entity before it is written to disk?

Each Datastore mode object's data and metadata is encrypted under the Advanced Encryption Standard (AES), and each encryption key is itself encrypted with a regularly rotated set of master keys.


1 Answers

When deciding on a security architecture, the first thing in your mind should always be threat models. Who are your potential attackers, what are their capabilities, and how can you defend against them? Without a clear idea of your threat model, you've got no way to assess whether or not your proposed security measures are sufficient, or even if they're necessary.

From your text, I'm guessing you're seeking to protect against some subset of the following:

  1. An attacker who compromises your datastore data, but not your application code.
  2. An attacker who obtains access to credentials to access the admin console of your app and can deploy new code.

For the former, encrypting or hashing your datastore data is likely sufficient (but see the caveats later in this answer). Protecting against the latter is tougher, but as long as your admin users can't execute arbitrary code without deploying a new app version, storing your keys in a module that's not checked in to source control, as you suggest, ought to work just fine, since even with admin access, they can't recover the keys, nor can they deploy a new version that reveals the keys to them. Make sure to disable downloading of source, obviously.

You rightly note some concerns about hashing of data with a limited amount of entropy - and you're right to be concerned. To some degree, salts can help with this by preventing precomputation attacks, and key stretching, such as that employed in PBKDF2, scrypt, and bcrypt, can make your attacker's life harder by increasing the amount of work they have to do. However, with something like SSN, your keyspace is simply so small that no amount of key stretching is going to help - if you hash the data, and the attacker gets the hash, they will be able to determine the original SSN.

In such situations, your only viable approach is to encrypt the data with a secret key. Now your attacker is forced to brute-force the key in order to get the data, a challenge that is orders of magnitude harder.

In short, my recommendation would be to encrypt your data using a standard (private key) cipher, with the key stored in a module not in source control. Using hashing instead will only weaken your data, while using public key cryptography doesn't provide appreciable security against any plausible threat model that you don't already have by using a standard cipher.

Of course, the number one way to protect your users' data is to not store it in the first place, if you can. :)

like image 157
Nick Johnson Avatar answered Oct 25 '22 00:10

Nick Johnson