How to implement password protection for individual files?

Tags:

I'm writing a little desktop app that should be able to encrypt a data file and protect it with a password (i.e. one must enter the correct password to decrypt). I want the encrypted data file to be self-contained and portable, so the authentication has to be embedded in the file (or so I assume).

I have a strategy that appears workable and seems logical based on what I know (which is probably just enough to be dangerous), but I have no idea if it's actually a good design or not. So tell me: is this crazy? Is there a better/best way to do it?

Step 1: User enters plain-text password, e.g. "MyDifficultPassword"
Step 2: App hashes the user-password and uses that value as the symmetric key to encrypt/decrypt the data file. e.g. "MyDifficultPassword" --> "HashedUserPwdAndKey".
Step 3: App hashes the hashed value from step 2 and saves the new value in the data file header (i.e. the unencrypted part of the data file) and uses that value to validate the user's password. e.g. "HashedUserPwdAndKey" --> "HashedValueForAuthentication"

Basically I'm extrapolating from the common way to implement web-site passwords (when you're not using OpenID, that is), which is to store the (salted) hash of the user's password in your DB and never save the actual password. But since I use the hashed user password for the symmetric encryption key, I can't use the same value for authentication. So I hash it again, basically treating it just like another password, and save the doubly-hashed value in the data file. That way, I can take the file to another PC and decrypt it by simply entering my password.

So is this design reasonably secure, or hopelessly naive, or somewhere in between? Thanks!

EDIT: clarification and follow-up question re: Salt.
I thought the salt had to be kept secret to be useful, but your answers and links imply this is not the case. For example, this spec linked by erickson (below) says:

Thus, password-based key derivation as defined here is a function of a password, a salt, and an iteration count, where the latter two quantities need not be kept secret.

Does this mean that I could store the salt value in the same place/file as the hashed key and still be more secure than if I used no salt at all when hashing? How does that work?

A little more context: the encrypted file isn't meant to be shared with or decrypted by others, it's really single-user data. But I'd like to deploy it in a shared environment on computers I don't fully control (e.g. at work) and be able to migrate/move the data by simply copying the file (so I can use it at home, on different workstations, etc.).

574

asked Sep 11 '08 05:09

Matt

2 Answers

Key Generation

I would recommend using a recognized algorithm such as PBKDF2 defined in PKCS #5 version 2.0 to generate a key from your password. It's similar to the algorithm you outline, but is capable of generating longer symmetric keys for use with AES. You should be able to find an open-source library that implements PBE key generators for different algorithms.

File Format

You might also consider using the Cryptographic Message Syntax as a format for your file. This will require some study on your part, but again there are existing libraries to use, and it opens up the possibility of inter-operating more smoothly with other software, like S/MIME-enabled mail clients.

Password Validation

Regarding your desire to store a hash of the password, if you use PBKDF2 to generate the key, you could use a standard password hashing algorithm (big salt, a thousand rounds of hashing) for that, and get different values.

Alternatively, you could compute a MAC on the content. A hash collision on a password is more likely to be useful to an attacker; a hash collision on the content is likely to be worthless. But it would serve to let a legitimate recipient know that the wrong password was used for decryption.

Cryptographic Salt

Salt helps to thwart pre-computed dictionary attacks.

Suppose an attacker has a list of likely passwords. He can hash each and compare it to the hash of his victim's password, and see if it matches. If the list is large, this could take a long time. He doesn't want spend that much time on his next target, so he records the result in a "dictionary" where a hash points to its corresponding input. If the list of passwords is very, very long, he can use techniques like a Rainbow Table to save some space.

However, suppose his next target salted their password. Even if the attacker knows what the salt is, his precomputed table is worthless—the salt changes the hash resulting from each password. He has to re-hash all of the passwords in his list, affixing the target's salt to the input. Every different salt requires a different dictionary, and if enough salts are used, the attacker won't have room to store dictionaries for them all. Trading space to save time is no longer an option; the attacker must fall back to hashing each password in his list for each target he wants to attack.

So, it's not necessary to keep the salt secret. Ensuring that the attacker doesn't have a pre-computed dictionary corresponding to that particular salt is sufficient.

171

answered Sep 27 '22 19:09

erickson

As Niyaz said, the approach sounds reasonable if you use a quality implementation of strong algorithms, like SHA-265 and AES for hashing and encryption. Additionally I would recommend using a Salt to reduce the possibility to create a dictionary of all password hashes.

Of course, reading Bruce Schneier's Applied Cryptography is never wrong either.

answered Sep 27 '22 18:09

David Schmitt

Related questions
                            
                                How to protect application against duplication of a virtual machine
                            
                                How is tr1::reference_wrapper useful?
                            
                                Is there a better way to get a named series of constants (enumeration) in Python? [duplicate]
                            
                                In WPF, has anybody animated a Grid?
                            
                                Is there a difference between commit and rollback in a transaction only having selects?
                            
                                Proximity Search
                            
                                What is a good Visio Enterprise Architect replacement? [closed]
                            
                                How does reflection tell me when a property is hiding an inherited member with the 'new' keyword?
                            
                                Is there a workaround for Java's poor performance on walking huge directories?
                            
                                Best practices for file system dependencies in unit/integration tests
                            
                                Using "with" statement for CSV files in Python
                            
                                Debugging Ajax code with Firebug

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With