I want to simply convert a string of any length to an integer value. Each string will map to a unique or even non-unique integer. Is there any existing opensource command that does this?
Bonus points if it is unique, such as computing the lexicographical order via a bash command.
Hashing is simply passing some data through a formula that produces a result, called a hash. That hash is usually a string of characters and the hashes generated by a formula are always the same length, regardless of how much data you feed into it. For example, the MD5 formula always produces 32 character-long hashes.
hash command in Linux system is the built-in command of bash which is used to maintain a hash table of recently executed programs. It remembers and shows the program locations. It will give the full pathname of each command name.
You need to be careful about using hash
functions from common programming languages. It has been common to introduce randomized seeds into hash functions, so that hash values are only unique for a single program execution. This avoids a denial-of-service attack noted in oCert advisory 2011-3. (As that advisory notes, the problem was described in 2003 in a paper presented to Usenix.)
For example, the Python hash function has been randomized by default since v3.3:
$ python3 -c 'from sys import argv;print(hash(argv[1]))' abc
-2595772619214671013
$ python3 -c 'from sys import argv;print(hash(argv[1]))' abc
-6001956461950650533
$ python3 -c 'from sys import argv;print(hash(argv[1]))' abc
-7414807274805087300
$ python3 -c 'from sys import argv;print(hash(argv[1]))' abc
-327608370992723225
# Python2 generates consistent hash values
$ python -c 'from sys import argv;print(hash(argv[1]))' abc
1453079729188098211
$ python -c 'from sys import argv;print(hash(argv[1]))' abc
1453079729188098211
$ python -c 'from sys import argv;print(hash(argv[1]))' abc
1453079729188098211
You can control hash randomization in Python by setting the PYTHONHASHSEED
environment variable.
Or you can use a standardized cryptographic hash like SHA-1. The commonly-available sha1sum
utility outputs its result in hexadecimal, but you can convert that to decimal with bash (truncated to 64 bits):
$ echo $((0x$(sha1sum <<<"string to hash")0))
-7037254581539467098
or in its full 160-bit glory, using bc
(which requires hex to be written in upper-case):
$ bc <<<ibase=16\;$(sha1sum <<<"string to hash"|tr a-z A-Z)0
861191872165666513280590001082621748432296579238
If you only need the hash value modulo some power of 16, you can use the first few bytes of the SHA-1 sum. (You could use any selection of bytes -- they're all equally well distributed -- but the first few are easier to extract):
$ echo $((0x$(sha1sum <<<"string to hash"|cut -c1-2)))
150
Note: As @gniourf_gniourf points out in a comment, the above doesn't really compute the SHA-1 checksum of the given string because the bash here-string syntax (<<<word
) appends a newline to word
. Since the checksum of the string with a newline appended is just as good a hash as the checksum of the string itself, there is no problem as long as you always use the same mechanism to produce the hash.
You could use the sum
or cksum
command (the latter being preferred) to generate a base-10 integer:
$ cksum <<< 'hello world' | cut -f 1 -d ' '
3733384285
$ cksum <<< 'goodbye world' | cut -f 1 -d ' '
2600070097
If you're interested in the math behind these simple hashes, check out the source implementations:
-r
and -s
command-line arguments.If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With