I'm currently working on a project which involves the daily extraction of data (pharmacy records) from a VisualFox Pro database, and uploading some of it to a WordPress site, where clients of the pharmacy can securely view it. I would like some advice in terms of the general methodology of my software - I am able to code it, but need to know if I'm going the right way about it. I'm writing both the PC software (in C#/.NET 4.5) and the PHP WordPress plugin.
The current process for encrypting the data server-side I plan to use is based on this article. Summarised, it advocates encrypting each separate user's data asymmetrically with their own public key, stored on the server. The private key to decrypt this data is then itself encrypted symmetrically using the user's password, and stored. This way, even if the database is stolen, the user's password hash needs to be broken, and even then the process needs to be repeated for every user's data.
The only weakness, pointed out by the author himself, and the main point of my question, is the fact that while the user is logged in, the decrypted key is stored in session storage. The way the article suggests to deal with it is to just limit the time the user is logged in. I thought a better solution would be to store that key in a short-lived secure cookie (of course the whole process is happening over HTTPS). That way, if the attacker has control of the user's computer and can read their cookies, they can probably just keylog their password and log in, no need to steal the database, while even if the attacker gains access to the server, they cannot decrypt the HTTPS traffic (or can they? I'm not sure.)
Should I use secure cookies or session storage to temporarily store the decrypted key?
The second thing I still want to work out is how to store the data - this is more of an efficiency problem. Since every user has their own key for encryption, it follows that records for every user must be stored separately. I don't know if I should store a "block" of data for every user, containing encrypted JSON with an array of objects representing records, or whether I should store records in a table with the actual data structure, and encrypt each data field separately with the key.
I am leaning towards storing the data as one block - it seems to me to be more efficient to decrypt one big block of data at a time, than perhaps several thousands separate fields. Also, even if I stored the data in its proper structure, I still wouldn't be able to use MySQL's WHERE, ORDERBY etc, since the data would all be BLOBs.
Should I store the data as a big block per user, or separated into the different fields?
I extract the data from the DBF file, and essentially make a "diff", whereby I compare the current extracted data from the last day's data, and only upload the blocks of the users that have changed (I can't only upload the records, as I probably will end up storing the users' data in blocks). I also include "delete" instructions for users which have been deleted. This is as there are hundreds of thousands records in the database, totalling over 200mb, and the size increases every day.
My current plan is to write all this data to a JSON file, gzip it and upload it to the server. My question is, how do I do that while ensuring the security of the data? Naturally, the upload will happen over HTTPS, and I have an API password in place to only allow authorised uploads, but my main concern is how to protect the data if the server is compromised. I don't want the attacker to just grab the JSON file from the server while it's being processed. One idea I had was to get the server to send me a list of public keys for the users, and perform the encryption in my software, before the upload. It seems to me like that's the only way of protecting that data. I could encrypt the whole JSON file, perhaps with an API key or a special password, but that's moot if the attacker can just access the decrypted file as it's being processed on the server. Is that a good solution?
Should I encrypt the data individually client-side, or is there a way to securely transfer it to the server and encrypt it there?
Thanks in advance for any answers, I'd love to hear from someone who's dealt with problems like this before.
Note: cross-posted to Programmers, see comments.
As it happens, I am working on a similar system to encrypt personal details (email, IP) in Wordpress comments, so that if the server is compromised, sensitive data in the database is still encrypted. Storing an assymetric decryption key in the session was out for me, since this could leave the key on the server for an attacker to grab at the same time as their compromising it.
So, cookies over an SSL cert is a better way to go - at least the attacker then has to wait for a user to log in before they can steal their key(s). In tandem with this, some sort of tripwire system would be a good idea, so that users cannot log onto the system (thus providing their keys to the waiting attacker) once it is compromised.
As you say, encrypting records (either with one key as per my design, or many keys as per yours) means that searching through records becomes a process you have to move away from your database server, which in turns means that it will be significantly slower.
You may be able to mitigate against this by making a trade-off between speed and security: some fields can be fuzzied and then stored unencrypted. For example, if you want to search where your patients are located, get their (lat, long) from their address, apply a random shift to it (say up to 3 miles on both axes in either direction) and then store the resulting coordinates in plain text. Approximate count queries relating to location can then be done without decryption.
The above looks at how to mitigate against attacks against the server, which is your greatest risk, since you have all your records stored there. As you rightly point out though, attacks on client machines is also a concern, and if they are members of the public then their security processes can be assumed to be non-existent.
On that basis you could strengthen a single password (which is given in its entirety) with a passphrase from which the client needs to select three random letters (i.e. it is specifically not given in its entirety). This defends elegantly against keyloggers in two ways: firstly drop-down menus are used, which are harder to eavesdrop, and even if the user uses keyboard shortcuts, they have not supplied the full phrase. At each successful logon, the index of the random letters (e.g. 1, 4 and 5) is recorded and not asked again for a long period. Obviously, too many wrong answers causes the account to be locked out and require reauthorisation via a phone call or snail-mail reset code.
Other authentication methods you could use: text the user an additional passphrase every time they enter the correct password, or (probably prohibitively expensive) use an authentication device as per online banking.
Another tip for security is to store as little personal information as possible. If you can do without the ability to immediately reset passwords via email, then name, address, telephone numbers and email - all personally identifying data - are perhaps unnecessary. That personal information can be stored separately on a disconnected database on another server, using a common primary key to link them together. (In fact if the user wishes to reset their password, you could simply store a flag against their anonymous user record, and the pharmacist can run the reset process manually on their firewalled machine when they next visit an admin panel).
Should you encrypt tabular data in one blob or leave it in each column? I've looked at this one as well in my application. For me, I stored it in one blob, since my use-case is search-intensive, and having N decrypts per row rather than one made the decision easy. That said, you may prefer the tidiness of encrypting columns individually, and one could argue that if corruption creeps in, separating them out gives you a better chance that some of the row will survive.
If you decide to store in a single blob, I am using a format similar to this (rows separated with newlines prior to being asymmetrically encrypted):
1.2 <-- version of this format, so I can add things in the future
key1=value1
key2=value2
...
If you have several processes writing to columns, make sure you lock rows between read and write, otherwise (as hinted above) you can lose some of your data.
As you say, this could equally be JSON, if that format is better for you.
My understanding of this question is: how do you replicate to an unencrypted offline copy given that you cannot decrypt user records yourself? I wonder here whether you could relax your security constraints a bit, and store a common public key on the server, and keep a separate record of changes encrypted with the common key. This would populate a table which should periodically be emptied (by running a sync routine on a remote secure machine); thus, the value of the changes table to an attacker will be small compared to obtaining the whole database unencrypted.
The corresponding private key, of course, should be on the pharmacist's computer, again securely fire-walled from the internet.
The risk with this design is that an attacker replaces the server public key with one of his/her own, so that they can later collect information that has effectively been encrypted just for them! However, as long as you've installed a trip-wire on the server, this can be reasonably defended against: if this is triggered, the dynamic part of the web application won't write any new changes (in fact won't work at all) until the system is scanned and determined to be safe.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With