Hash function that produces short hashes?

People also ask

Which hash function is fastest?

SHA-1 is fastest hashing function with ~587.9 ms per 1M operations for short strings and 881.7 ms per 1M for longer strings. MD5 is 7.6% slower than SHA-1 for short strings and 1.3% for longer strings.

What is MD5 and SHA-256?

Both MD5 and SHA256 are used as hashing algorithms. They take an input file and generate an output which can be of 256/128-bit size. This output represents a checksum or hash value. As, collisions are very rare between hash values, so no encryption takes place.

Why hash types create a hash of a different length?

A larger bit hash can provide more security because there are more possible combinations. Remember that one of the important functions of a cryptographic hashing algorithm is that is produces unique hashes. Again, if two different values or files can produce the same hash, you create what we call a collision.

You can use any commonly available hash algorithm (eg. SHA-1), which will give you a slightly longer result than what you need. Simply truncate the result to the desired length, which may be good enough.

For example, in Python:

>>> import hashlib
>>> hash = hashlib.sha1("my message".encode("UTF-8")).hexdigest()
>>> hash
'104ab42f1193c336aa2cf08a2c946d5c6fd0fcdb'
>>> hash[:10]
'104ab42f11'

If you don't need an algorithm that's strong against intentional modification, I've found an algorithm called adler32 that produces pretty short (~8 character) results. Choose it from the dropdown here to try it out:

http://www.sha1-online.com/

You need to hash the contents to come up with a digest. There are many hashes available but 10-characters is pretty small for the result set. Way back, people used CRC-32, which produces a 33-bit hash (basically 4 characters plus one bit). There is also CRC-64 which produces a 65-bit hash. MD5, which produces a 128-bit hash (16 bytes/characters) is considered broken for cryptographic purposes because two messages can be found which have the same hash. It should go without saying that any time you create a 16-byte digest out of an arbitrary length message you're going to end up with duplicates. The shorter the digest, the greater the risk of collisions.

However, your concern that the hash not be similar for two consecutive messages (whether integers or not) should be true with all hashes. Even a single bit change in the original message should produce a vastly different resulting digest.

So, using something like CRC-64 (and base-64'ing the result) should get you in the neighborhood you're looking for.

You can use the hashlib library for Python. The shake_128 and shake_256 algorithms provide variable length hashes. Here's some working code (Python3):

import hashlib
>>> my_string = 'hello shake'
>>> hashlib.shake_256(my_string.encode()).hexdigest(5)
'34177f6a0a'

Notice that with a length parameter x (5 in example) the function returns a hash value of length 2x.

Just summarizing an answer that was helpful to me (noting @erasmospunk's comment about using base-64 encoding). My goal was to have a short string that was mostly unique...

I'm no expert, so please correct this if it has any glaring errors (in Python again like the accepted answer):

import base64
import hashlib
import uuid

unique_id = uuid.uuid4()
# unique_id = UUID('8da617a7-0bd6-4cce-ae49-5d31f2a5a35f')

hash = hashlib.sha1(str(unique_id).encode("UTF-8"))
# hash.hexdigest() = '882efb0f24a03938e5898aa6b69df2038a2c3f0e'

result = base64.b64encode(hash.digest())
# result = b'iC77DySgOTjliYqmtp3yA4osPw4='

The result here is using more than just hex characters (what you'd get if you used hash.hexdigest()) so it's less likely to have a collision (that is, should be safer to truncate than a hex digest).

Note: Using UUID4 (random). See http://en.wikipedia.org/wiki/Universally_unique_identifier for the other types.

If you need "sub-10-character hash" you could use Fletcher-32 algorithm which produces 8 character hash (32 bits), CRC-32 or Adler-32.

CRC-32 is slower than Adler32 by a factor of 20% - 100%.

Fletcher-32 is slightly more reliable than Adler-32. It has a lower computational cost than the Adler checksum: Fletcher vs Adler comparison.

A sample program with a few Fletcher implementations is given below:

    #include <stdio.h>
    #include <string.h>
    #include <stdint.h> // for uint32_t

    uint32_t fletcher32_1(const uint16_t *data, size_t len)
    {
            uint32_t c0, c1;
            unsigned int i;

            for (c0 = c1 = 0; len >= 360; len -= 360) {
                    for (i = 0; i < 360; ++i) {
                            c0 = c0 + *data++;
                            c1 = c1 + c0;
                    }
                    c0 = c0 % 65535;
                    c1 = c1 % 65535;
            }
            for (i = 0; i < len; ++i) {
                    c0 = c0 + *data++;
                    c1 = c1 + c0;
            }
            c0 = c0 % 65535;
            c1 = c1 % 65535;
            return (c1 << 16 | c0);
    }

    uint32_t fletcher32_2(const uint16_t *data, size_t l)
    {
        uint32_t sum1 = 0xffff, sum2 = 0xffff;

        while (l) {
            unsigned tlen = l > 359 ? 359 : l;
            l -= tlen;
            do {
                sum2 += sum1 += *data++;
            } while (--tlen);
            sum1 = (sum1 & 0xffff) + (sum1 >> 16);
            sum2 = (sum2 & 0xffff) + (sum2 >> 16);
        }
        /* Second reduction step to reduce sums to 16 bits */
        sum1 = (sum1 & 0xffff) + (sum1 >> 16);
        sum2 = (sum2 & 0xffff) + (sum2 >> 16);
        return (sum2 << 16) | sum1;
    }

    int main()
    {
        char *str1 = "abcde";  
        char *str2 = "abcdef";

        size_t len1 = (strlen(str1)+1) / 2; //  '\0' will be used for padding 
        size_t len2 = (strlen(str2)+1) / 2; // 

        uint32_t f1 = fletcher32_1(str1,  len1);
        uint32_t f2 = fletcher32_2(str1,  len1);

        printf("%u %X \n",    f1,f1);
        printf("%u %X \n\n",  f2,f2);

        f1 = fletcher32_1(str2,  len2);
        f2 = fletcher32_2(str2,  len2);

        printf("%u %X \n",f1,f1);
        printf("%u %X \n",f2,f2);

        return 0;
    }

Output:

4031760169 F04FC729                                                                                                                                                                                                                              
4031760169 F04FC729                                                                                                                                                                                                                              

1448095018 56502D2A                                                                                                                                                                                                                              
1448095018 56502D2A

Agrees with Test vectors:

"abcde"  -> 4031760169 (0xF04FC729)
"abcdef" -> 1448095018 (0x56502D2A)

Adler-32 has a weakness for short messages with few hundred bytes, because the checksums for these messages have a poor coverage of the 32 available bits. Check this:

The Adler32 algorithm is not complex enough to compete with comparable checksums.

Related questions
                            
                                What's the difference between id_rsa.pub and id_dsa.pub?
                            
                                What's the difference between SHA and AES encryption? [closed]
                            
                                Best practices around generating OAuth tokens?
                            
                                The necessity of hiding the salt for a hash
                            
                                SVN encrypted password store
                            
                                Image encryption/decryption using AES256 symmetric block ciphers [closed]
                            
                                AES vs Blowfish for file encryption
                            
                                mcrypt is deprecated, what is the alternative?
                            
                                I need to securely store a username and password in Python, what are my options? [closed]
                            
                                What is the meaning of ToString("X2")? [duplicate]
                            
                                What is the difference between DSA and RSA?
                            
                                Javascript AES encryption [closed]
                            
                                What are the differences between .pem, .cer and .der?
                            
                                Converting Secret Key into a String and Vice Versa
                            
                                Initial bytes incorrect after Java AES/CBC decryption
                            
                                How to encrypt/decrypt data in php?
                            
                                How does BitLocker affect performance? [closed]
                            
                                AES Encryption for an NSString on the iPhone
                            
                                How to encrypt bytes using the TPM (Trusted Platform Module)
                            
                                How to send password securely over HTTP?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Hash function that produces short hashes?

Tags:

uniqueidentifier

encryption

People also ask

Recent Activity

Donate For Us