If I invoke the command from Mac
echo hello | shasum -a 256
or from ubuntu
echo hello | sha256sum
Then I get the following result
5891b5b522d5df086d0ff0b110fbd9d21bb4fc7163af34d08286a2e846f6be03 -
I notice there is dash at the end.
But when I use Python hashlib
or Java java.security.MessageDigest
, they give me the same result as follows:
2cf24dba5fb0a30e26e83b2ac5b9e29e1b161e5c1fa7425e73043362938b9824
So, could anyone point out where I got it wrong please?
Thanks.
Python:
>>> import hashlib
>>> hashlib.sha256("hello").hexdigest()
Java:
MessageDigest md = MessageDigest.getInstance("SHA-256");
String text = "hello";
md.update(text.getBytes("UTF-8"));
byte[] digest = md.digest();
StringBuffer sb = new StringBuffer();
for (int i = 0; i < digest.length; i++) {
sb.append(String.format("%02x", digest[i] & 0xFF))
}
System.out.println(sb.toString());
In principle SHA-256 is a well-defined deterministic function that should always yield the same output upon the same input.
This is entirely spurious but will still result in a different textual representation but an identical value for the number. If the encoding of the string differs then the binary input of the hash algorithm differs, and you will get results that differ by about 50% of the bits for a common cryptographic hash.
Yes, if you hash the same input with the same function, you will always get the same result. This follows from the fact that it is a hash-function. By definition a function is a relation between a set of inputs and a set of permissible outputs with the property that each input is related to exactly one output.
SHA-256 is so secure that even if we change just one digit in the input, the output changes completely. SHA-256 returns a hexadecimal representation, 256 bits represented 64 hex characters. No matter how long the input is, it will always return a hexadecimal string of 64 characters.
The echo
commands are adding a trailing newline to your string. Try:
hashlib.sha256("hello\n").hexdigest()
TL;DR this is an extensive answer explaining character and hex encoding, you can skip this and look at the code below
The sha256sum
and related commands are adding the dash: -
in the output. These commands have been made to show hash values of *files. A single dash simply means that the input was from the standard inpuIt stream (i.e. there is no file name). Unfortunately I don't see an option to suppress the output, so you have to remove it yourself to get to the actual hash value.
So the hash utilities do not only return the hash value. A SHA-256 hash value simply consists of 32 bytes. As humans cannot read binary the binary is displayed using hexadecimals, but the actual value should still be thought of as bytes. The hexadecimal characters are just a representation of those bytes.
The input of hash functions consist of bits or rather bytes as well. This means that any difference in encoding text will mean that the hash value will be different. This is especially tricky when it comes to white-space and end-of-line encoding. Instead of adding a trailing newline it is probably better to suppress it with the -n
command line option for the echo
command in the case of "hello" though.
Beware that hexadecimals themselves can also be displayed in different ways; you would make sure whitespace is not present and that the comparison is case-insensitive or that the representation of the bytes always uses the same case.
Using sha256sum
:
echo -n "hello" | sha256sum | tr -d "[:space:]-"
Using OpenSSL command line:
echo -n hello | openssl sha256 -binary | od -An -tx1 | tr -d "[:space:]"
Here od -An -tx1
will show each byte separately, instead of grouping them which may lead to problems with endianness.
tr -d "[:space:]
will remove spaces from the hexadecimals as well as the trailing newline. For sha256sum
the dash file indicator is also removed (note the -
at the end). This way it is possible to perform a textual (case insensitive) compare.
In Python without the trailing end of line:
print(hashlib.sha256("hello").hexdigest(), end="")
In the case of Java you should also make sure that the text encoding matches the system default encoding or you may get into trouble. So you should change:
md.update(text.getBytes("UTF-8"));
to
md.update(text.getBytes());
to get to the platform character encoding. If you don't the compare will fail if the encoding of the platform is not compatible with UTF-8 for the string you want to compare.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With