Should I Use Base64 or Unicode for Storing Hashes & Salts?

Tags:

I have never worked on the security side of web apps, as I am just out of college. Now, I am looking for a job and working on some websites on the side, to keep my skills sharp and gain new ones. One site I am working on is pretty much copied from the original MEAN stack from the guys that created it, but trying to understand it and do things better where I can.

To compute the hash & salt, the creators used PBKDF2. I am not interested in hearing about arguments for or against PBKDF2, as that is not what this question is about. They seem to have used buffers for everything here, which I understand is a common practice in node. What I am interested in are their reasons for using base64 for the buffer encoding, rather than simply using UTF-8, which is an option with the buffer object. Most computers nowadays can handle many of the characters in Unicode, if not all of them, but the creators could have chosen to encode the passwords in a subset of Unicode without restricting themselves to the 65 characters of base64.

By "the choice between encoding as UTF-8 or base64", I mean transforming the binary of the hash, computed from the password, into the given encoding. node.js specifies a couple ways to encode binary data into a Buffer object. From the documentation page for the Buffer class:

Pure JavaScript is Unicode friendly but not nice to binary data. When dealing with TCP
streams or the file system, it's necessary to handle octet streams. Node has several
strategies for manipulating, creating, and consuming octet streams.

Raw data is stored in instances of the Buffer class. A Buffer is similar to an array
of integers but corresponds to a raw memory allocation outside the V8 heap. A Buffer
cannot be resized.

What the Buffer class does, as I understand it, is take some binary data and calculate the value of each 8 (usually) bits. It then converts each set of bits into a character corresponding to its value in the encoding you specify. For example, if the binary data is 00101100 (8 bits), and you specify UTF-8 as the encoding, the output would be , (a comma). This is what anyone looking at the output of the buffer would see when looking at it with a text editor such as vim, as well as what a computer would "see" when "reading" them. The Buffer class has several encodings available, such as UTF-8, base64, and binary.

I think they felt that, while storing any UTF-8 character imaginable in the hash, as they would have to do, would not phase most modern computers, with their gigabytes of RAM and terabytes of space, actually showing all these characters, as they may want to do in logs, etc., would freak out users, who would have to look at weird Chinese, Greek, Bulgarian, etc. characters, as well as control characters, like the Ctrl button or the Backspace button or even beeps. They would never really need to make sense of any of them, unless they were experienced users testing PBKDF2 itself, but the programmer's first duty is to not give any of his users a heart attack. Using base64 increases the overhead by about a third, which is hardly worth noting these days, and decreases the character set, which does nothing to decrease the security. After all, computers are written completely in binary. As I said before, they could have chosen a different subset of Unicode, but base64 is already standard, which makes it easier and reduces programmer work.

Am I right about the reasons why the creators of this repository chose to encode its passwords in base64, instead of all of Unicode? Is it better to stick with their example, or should I go with Unicode or a larger subset of it?

381

asked Nov 17 '14 21:11

trysis

2 Answers

A hash value is a sequence of bytes. This is binary information. It is not a sequence of characters.

UTF-8 is an encoding for turning sequences of characters into sequences of bytes. Storing a hash value "as UTF-8" makes no sense, since it is already a sequence of bytes, and not a sequence of characters.

Unfortunately, many people have took to the habit of considering a byte as some sort of character in disguise; it was at the basis of the C programming language and still infects some rather modern and widespread frameworks such as Python. However, only confusion and sorrow lie down that path. The usual symptoms are people wailing and whining about the dreadful "character zero" -- meaning, a byte of value 0 (a perfectly fine value for a byte) that, turned into a character, becomes the special character that serves as end-of-string indicator in languages from the C family. This confusion can even lead to vulnerabilities (the zero implying, for the comparison function, an earlier-than-expected termination).

Once you have understood that binary is binary, the problem becomes: how are we to handle and store our hash value ? In particular in JavaScript, a language that is known to be especially poor at handling binary values. The solution is an encoding that turns the bytes into characters, not just any character, but a very small subset of well-behaved characters. This is called Base64. Base64 is a generic scheme for encoding bytes into character strings that don't include problematic characters (no zero, only ASCII printable characters, excluding all the control characters and a few others such as quotes).

Not using Base64 would imply assuming that JavaScript can manage an arbitrary sequence of bytes as if it was just "normal characters", and that is simply not true.

answered Sep 29 '22 20:09

Tom Leek

There is a fundamental, security-related reason to store as Base64 rather than Unicode: the hash may contain the byte value "0", used by many programming languages as an end-of-string marker.

If you store your hash as Unicode, you, another programmer, or some library code you use may treat it as a string rather than a collection of bytes, and compare using strcmp() or a similar string-comparison function. If your hash contains the byte value "0", you've effectively truncated your hash to just the portion before the "0", making attacks much easier.

Base64 encoding avoids this problem: the byte value "0" cannot occur in the encoded form of the hash, so it doesn't matter if you compare encoded hashes using memcmp() (the right way) or strcmp() (the wrong way).

This isn't just a theoretical concern, either: there have been multiple cases of code for checking digital signatures using strcmp(), greatly weakening security.

answered Sep 29 '22 18:09

Mark

Related questions
                            
                                Error after Fingerprint touched on Samsung phones: android.security.KeyStoreException: Key user not authenticated
                            
                                Where's the encryption key stored in Jenkins?
                            
                                postfix and openJDK 11: "No appropriate protocol (protocol is disabled or cipher suites are inappropriate)"
                            
                                Is a GUID a good key for (temporary) encryption?
                            
                                AES encryption: InvalidKeyException: Key length not 128/192/256 bits
                            
                                How to encrypt/decrypt URL parameters in javascript?
                            
                                Is there any difference, if I init AES cipher, with and without IvParameterSpec
                            
                                What's the recommended hashing algorithm to use for stored passwords?
                            
                                Encrypt and decrypt large string in java using RSA [closed]
                            
                                PyPDF 2 Decrypt Not Working
                            
                                How to convert from String to PublicKey?
                            
                                Fractal Encryption
                            
                                PGP Encrypt and Decrypt
                            
                                How to create encrypted Jar file? [closed]
                            
                                Why not use AES for password encryption in PHP?
                            
                                JavaScript atob operation using PHP
                            
                                How to implement Triple DES in C# (complete example)
                            
                                Is it secure to store a password in a session? [duplicate]
                            
                                RSA Encrypt / Decrypt Problem in .NET
                            
                                Java - Encrypt String with existing public key file

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Should I Use Base64 or Unicode for Storing Hashes & Salts?

Tags:

unicode

encryption

trysis

People also ask

2 Answers

Tom Leek

Mark

Recent Activity

Donate For Us