Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is Python's hashlib.sha256(x).hexdigest() equivalent to Rs digest(x,algo="sha256")

Tags:

python

r

I'm not a python programmer, but I'm trying to translate some Python code to R. The piece of python code I'm having trouble with is:

hashlib.sha256(x).hexdigest()

My interpretation of this code is that the function is going to calculate the hash of x using the sha256 algorithm and return the value in hex.

Given that interpretation, I am using the following R function:

digest(x, algo="sha256", raw=FALSE)

Based upon my albeit limited knowledge of R and what I have read online on Python's hashlib function the two functions should be producing identical results, but they are not.  

Am I missing something or am I using the wrong R function.

like image 889
Mutuelinvestor Avatar asked Jul 03 '15 14:07

Mutuelinvestor


People also ask

What is Hashlib SHA256?

Using Python hashlib to Implement SHA256. Python has a built-in library, hashlib , that is designed to provide a common interface to different secure hashing algorithms. The module provides constructor methods for each type of hash. For example, the . sha256() constructor is used to create a SHA256 hash.

What is Hashlib digest?

Source code: Lib/hashlib.py. This module implements a common interface to many different secure hash and message digest algorithms.

What is Hexdigest Python?

hexdigest() : Returns the encoded data in hexadecimal format.

What does Hashlib do in Python?

This module implements a common interface to many different secure hash and message digest algorithms. Included are the FIPS secure hash algorithms SHA1, SHA224, SHA256, SHA384, and SHA512 (defined in FIPS 180-2) as well as RSA's MD5 algorithm (defined in Internet RFC 1321).


1 Answers

Yes, both the Python and the R sample code returns a hexadecimal representation of a SHA256 hash digest for the data passed in.

You do need to switch off serialisation in R, otherwise you the digest() package first creates a serialisation of the string rather than calculate the hash for the character data only; set serialize to FALSE:

> digest('', algo="sha256", serialize=FALSE)
[1] "e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855"
> digest('hello world', algo="sha256", serialize=FALSE)
[1] "b94d27b9934d3e08a52e52d7da7dabfac484efe37a5380ee9088f7ace2efcde9"

These match their Python equivalents:

>>> import hashlib
>>> hashlib.sha256('').hexdigest()
'e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855'
>>> hashlib.sha256('hello world').hexdigest()
'b94d27b9934d3e08a52e52d7da7dabfac484efe37a5380ee9088f7ace2efcde9'

If your hashes then still differ between R and Python, then your data is different. That could be a subtle as a newline at the end of the line, or a byte order mark at the start.

In Python, inspect the output of print(repr(x)) to represent the data as a Python string literal; this shows non-printable characters as escape sequences. I'm sure R has similar debugging tools. Both R and Python echo string values as representations when using their interactive modes.

like image 90
Martijn Pieters Avatar answered Sep 29 '22 10:09

Martijn Pieters