I've been told that you should not store the users password in a database, but how can I authenticate users if I cannot save their password? Is simply encrypting them enough to keep them safe?
There have been several stories in the news lately of high-profile sites that have been compromised, like LinkedIn, and I don't think such a high profile site would store plain-text passwords, so would assume they were encrypted.
Disclaimer: I've originally posted this on Quora but felt that the answer was more suited to Stack Overflow.
The method used to store and check user passwords without actually keeping the passwords is to compare the user input to the stored hash.
What is hashing?
Hashing is the process of passing data of variable length (small passwords, big passwords, binary files, whatever) through an algorithm that returns it as a set of fixed length called a hash value. Hashes only work one way. An *.img file consisting of several Mb can be hashed exactly the same as a password. (actually it's a common practice to use hashes on large files to check for their integrity; say you download a file using bittorrent, when it's complete the software hashes it and compares the hash of what you have with the hash of what you where supposed to have, if they match the download is not corrupt).
How does auth with hashes work?
When the user registers he gives a password, say pass123
that is then hashed (by any of the available hashing algorithms: sha1, sha256, etc, on this case md5) to the value 32250170a0dca92d53ec9624f336ca24
and that value is stored on database. Every time you try to login the system will hash you password in real time and compare it to the stored hash, if it matches, you're good to go. You can try an online md5 hasher here: http://md5-hash-online.waraxe.us/
What if two hashes are the same? Could a user login with a different pass?
He could! That is called a collision. Say that on a fictional hashing algorithm the value pass123
would produce the hash ec9624
and the value pass321
would produce the exact same hash, that hashing algorithm would be broken. Both common algorithms md5 and sha1 (the one LinkedIn used) are broken as collisions have been found. Being broken does not necessarily means it's unsafe.
How can you exploit collisions?
If you can generate a hash, that is the same as the hash generated by the user password you can identify to that site as the user.
Rainbow table-attacks.
Crackers quickly understood that once they had captured a table of hashed-passwords it would not be feasible to exploit passwords one by one so they devised a new attack vector. They would generate every single password in existence (aaa, aab, aac, aad, etc, etc) and store all the hashes in a database. Then they would only need to search for the stolen hash on the database with all the sequentially generated hashes (a sub-second query) and get the according password.
Salt to the rescue (and where LinkedIn failed big!)
Security is defined by the amount of time it will take for a cracker to break your password and the frequency by which you change it. With rainbow tables security drops really fast so the industry came up with salt. What if every password had a unique twist? That's salt! For every user that registers you generate a random string, say 3 characters (the industry recommends 16 chars - https://stackoverflow.com/a/18419...). Then you concatenate the user's password with your random string.
password - salt - sha1 hash
qwerty - 123 - 5cec175b165e3d5e62c9e13ce848ef6feac81bff
qwerty - 321 - b8b92ab870c50ce5fc59571dc0c77f9a4a90323c
qazwsx - abc - c6aec64efe2a25c6bc35aeea2aafb2e86ac96a0c
qazwsx - cba - 31e42c24f71dc5a453b2635e6ec57eadf03090fd
As you can see the exact same passwords, given different values of salt, generate completely different hashes. That is the purpose of salt and why LinkedIn failed big. Notice that on table you will only store the hash and the salt! Never the password!
The first thing the guys that got their hand on the LinkedIn hashes did was to sort of the hashes and see if there were matches (there were because multiple users had the same password - shame on them!) those users were the first to drop. If the pass table was salted... none of that would have happened and they would need an excruciating amount of time (and computer resources) to crack every single password. That would have given LinkedIn plenty of time to enforce a new password policy.
Hope the technical side of the answer gave insight as to how authentication works (or should work).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With