Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to store 'query-able' encrypted data in MySQL?

I need a way to store encrypted data, in a way that I can still run queries. Is that even possible?

At least I need an encryption algorythm that always returns the same string for the same input, so I can find all the users named 'John' by encrypting that string and looking for the encrypted result in the DB. In PHP, mcrypt always returns different strings (I know this is on purpose, to improve security).

Any ideas?

like image 496
HappyDeveloper Avatar asked Oct 26 '11 15:10

HappyDeveloper


2 Answers

Depends on how you're storing that 'John' name. If it's the ONLY thing in a particular field, you can do something like

SELECT ...
FROM sometable
WHERE cryptedfirstname = AES_ENCRYPT('John', $key)

If 'John' is part of a larger string ('John Doe' or 'King John, Ruler of the Universe'), then you'll have to decrypt the full field and match against that

SELECT ...
FROM sometime
WHERE INSTR(AES_DECRYPT(cryptedFULLame, $key), 'John') > 0

Note that I'm embedding the decryption key in the query. Bad idea for a production system, but this is just an example.

You won't be able to do something like:

...
WHERE INSTR(cryptedFULLname, AES_ENCRYPT('John', $key))

due to how AES and most other userful/decent crypto systems work.

like image 110
Marc B Avatar answered Sep 16 '22 20:09

Marc B


It sounds like you understand this, but it should be emphasized that an encryption algorithm that always produces the same cipher text for a given plain text is broken. It leads to all kinds of attacks.

For example, an attacker with access to your database and the application can choose the value "John" for a field, and cause the application to store it in the database. Then he can look at the cipher text for his record, and identify any other records that contain that cipher text. He doesn't need to obtain the key for this.

An exception would be if you are encrypting large, "unpredictable" unique numbers, for example, session identifiers or UUIDs. In this case, since the plain texts don't recur, and valid plaintexts can't be predicted by an attacker, ciphertext indistinguishability is not required.

Any symmetric cipher used in ECB mode will produce consistent ciphertext from a plaintext, as will using modes that take an initialization vector if you always use the same IV. It's just generally not a good idea to do this.

like image 38
erickson Avatar answered Sep 18 '22 20:09

erickson