Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Bash, how to hash value of a string?

Tags:

string

bash

I want to simply convert a string of any length to an integer value. Each string will map to a unique or even non-unique integer. Is there any existing opensource command that does this?

Bonus points if it is unique, such as computing the lexicographical order via a bash command.

like image 883
Zombies Avatar asked Mar 04 '15 00:03

Zombies


People also ask

How hash value is calculated?

Hashing is simply passing some data through a formula that produces a result, called a hash. That hash is usually a string of characters and the hashes generated by a formula are always the same length, regardless of how much data you feed into it. For example, the MD5 formula always produces 32 character-long hashes.

What is bash hash command?

hash command in Linux system is the built-in command of bash which is used to maintain a hash table of recently executed programs. It remembers and shows the program locations. It will give the full pathname of each command name.


2 Answers

You need to be careful about using hash functions from common programming languages. It has been common to introduce randomized seeds into hash functions, so that hash values are only unique for a single program execution. This avoids a denial-of-service attack noted in oCert advisory 2011-3. (As that advisory notes, the problem was described in 2003 in a paper presented to Usenix.)

For example, the Python hash function has been randomized by default since v3.3:

$ python3 -c 'from sys import argv;print(hash(argv[1]))' abc
-2595772619214671013
$ python3 -c 'from sys import argv;print(hash(argv[1]))' abc
-6001956461950650533
$ python3 -c 'from sys import argv;print(hash(argv[1]))' abc
-7414807274805087300
$ python3 -c 'from sys import argv;print(hash(argv[1]))' abc
-327608370992723225
# Python2 generates consistent hash values
$ python -c 'from sys import argv;print(hash(argv[1]))' abc
1453079729188098211
$ python -c 'from sys import argv;print(hash(argv[1]))' abc
1453079729188098211
$ python -c 'from sys import argv;print(hash(argv[1]))' abc
1453079729188098211

You can control hash randomization in Python by setting the PYTHONHASHSEED environment variable.

Or you can use a standardized cryptographic hash like SHA-1. The commonly-available sha1sum utility outputs its result in hexadecimal, but you can convert that to decimal with bash (truncated to 64 bits):

$ echo $((0x$(sha1sum <<<"string to hash")0))
-7037254581539467098

or in its full 160-bit glory, using bc (which requires hex to be written in upper-case):

$ bc <<<ibase=16\;$(sha1sum <<<"string to hash"|tr a-z A-Z)0
861191872165666513280590001082621748432296579238

If you only need the hash value modulo some power of 16, you can use the first few bytes of the SHA-1 sum. (You could use any selection of bytes -- they're all equally well distributed -- but the first few are easier to extract):

$ echo $((0x$(sha1sum <<<"string to hash"|cut -c1-2)))
150

Note: As @gniourf_gniourf points out in a comment, the above doesn't really compute the SHA-1 checksum of the given string because the bash here-string syntax (<<<word) appends a newline to word. Since the checksum of the string with a newline appended is just as good a hash as the checksum of the string itself, there is no problem as long as you always use the same mechanism to produce the hash.

like image 110
rici Avatar answered Oct 10 '22 14:10

rici


You could use the sum or cksum command (the latter being preferred) to generate a base-10 integer:

$ cksum <<< 'hello world' | cut -f 1 -d ' '
3733384285

$ cksum <<< 'goodbye world' | cut -f 1 -d ' '
2600070097

If you're interested in the math behind these simple hashes, check out the source implementations:

  • cksum calculates the AUTODIN II polynomial used by Ethernet
  • sum calculates either the 16-bit CRC or the POSIX 1003.2 CRC, depending upon the -r and -s command-line arguments.
like image 20
bishop Avatar answered Oct 10 '22 14:10

bishop