I'm trying to port a python (2.7) script to Java. It iterates a sha256 hash several times but they end up with different results. I've noticed the first time they return the same result, but from there on it differs.
Here is the Python implementation:
import hashlib
def to_hex(s):
  print " ".join(hex(ord(i)) for i in s)
d = hashlib.sha256()
print "Entry:"
r = chr(1)
to_hex(r)
for i in range(2):
  print "Loop", i
  d.update(r)
  r = d.digest()
  to_hex(r)
And in Java:
import java.security.MessageDigest;
import java.security.NoSuchAlgorithmException;
public class LoopTest {
  public static void main(String[] args) {
    MessageDigest d;
    try {
      d = MessageDigest.getInstance("SHA-256");
    } catch (NoSuchAlgorithmException e) {
      System.out.println("NoSuchAlgorithmException");
      return;
    }
    System.out.println("Entry:");
    byte[] r = new byte[] {1};
    System.out.println(toHex(r));
    for(int i = 0; i < 2; i++) {
      System.out.printf("Loop %d\n", i);
      d.update(r);
      r = d.digest();
      System.out.println(toHex(r));
    }
  }
  private static String toHex(byte[] bytes) {
    StringBuilder sb = new StringBuilder(bytes.length);
    for (byte b: bytes) {
       sb.append(String.format("0x%02X ", b));
    }
    return sb.toString();
  }
}
The outputs are, for python:
$ python looptest.py
Entry:
0x1
Loop 0
0x4b 0xf5 0x12 0x2f 0x34 0x45 0x54 0xc5 0x3b 0xde 0x2e 0xbb 0x8c 0xd2 0xb7 0xe3 0xd1 0x60 0xa 0xd6 0x31 0xc3 0x85 0xa5 0xd7 0xcc 0xe2 0x3c 0x77 0x85 0x45 0x9a
Loop 1
0x98 0x1f 0xc8 0xd4 0x71 0xa8 0xb0 0x19 0x32 0xe3 0x84 0xac 0x1c 0xd0 0xa0 0x62 0xc4 0xdb 0x2c 0xe 0x13 0x58 0x61 0x9a 0x83 0xd1 0x67 0xf5 0xe8 0x4e 0x6a 0x17
And for java:
$ java LoopTest
Entry:
0x01
Loop 0
0x4B 0xF5 0x12 0x2F 0x34 0x45 0x54 0xC5 0x3B 0xDE 0x2E 0xBB 0x8C 0xD2 0xB7 0xE3 0xD1 0x60 0x0A 0xD6 0x31 0xC3 0x85 0xA5 0xD7 0xCC 0xE2 0x3C 0x77 0x85 0x45 0x9A
Loop 1
0x9C 0x12 0xCF 0xDC 0x04 0xC7 0x45 0x84 0xD7 0x87 0xAC 0x3D 0x23 0x77 0x21 0x32 0xC1 0x85 0x24 0xBC 0x7A 0xB2 0x8D 0xEC 0x42 0x19 0xB8 0xFC 0x5B 0x42 0x5F 0x70
What could be the reason for this difference?
Edit:
Thanks for the answers @dcsohl and @Alik I understand the reason now. Since I'm porting the Python script to Java I had to keep the Python one as it is so I modified the Java program like this:
byte[] r2 = new byte[]{};
for(int i = 0; i < 2; i++) {
  System.out.printf("Loop %d\n", i);
  d.update(r);
  r2 = d.digest();
  System.out.println(toHex(r2));
  byte[] c = new byte[r.length + r2.length];
  System.arraycopy(r, 0, c, 0, r.length);
  System.arraycopy(r2, 0, c, r.length, r2.length);
  r = c;
}
The two languages run update() and digest() differently.
The python documentation for update() says
Update the hash object with the string arg. Repeated calls are equivalent to a single call with the concatenation of all the arguments:
m.update(a); m.update(b)is equivalent tom.update(a+b).
I tested this by using the shell sha256sum command.
echo -n '\0x01\0x4b\0xf5\0x12\0x2f\0x34\0x45\0x54\0xc5\0x3b\0xde\0x2e\0xbb\0x8c\0xd2\0xb7\0xe3\0xd1\0x60\0xa\0xd6\0x31\0xc3\0x85\0xa5\0xd7\0xcc\0xe2\0x3c\0x77\0x85\0x45\0x9a' | sha256sum
981fc8d471a8b01932e384ac1cd0a062c4db2c0e1358619a83d167f5e84e6a17 *-
You started with \0x01 so that's the first byte, and then the rest of the bytes are the hash of 0x01. The resultant hash matches your Python output.
Now look at this - I omitted the initial \0x01 and got the hash back - it matches your Java output.
> echo -n '\0x4b\0xf5\0x12\0x2f\0x34\0x45\0x54\0xc5\0x3b\0xde\0x2e\0xbb\0x8c\0xd2\0xb7\0xe3\0xd1\0x60\0xa\0xd6\0x31\0xc3\0x85\0xa5\0xd7\0xcc\0xe2\0x3c\0x77\0x85\0x45\0x9a' | sha256sum
9c12cfdc04c74584d787ac3d23772132c18524bc7ab28dec4219b8fc5b425f70 *-
But why? Shouldn't the initial \0x01 be included? It would be, except that the javadoc for digest() says:
Completes the hash computation by performing final operations such as padding. The digest is reset after this call is made.
So your initial \0x01 gets dropped when you call digest() in java, and you are simply digesting the old digest without the initial \0x01 entry.
In Java d.digest returns message digest and resets digest in the end.
In Python d.digest doesn't reset digest. Thus, repeated calls d.update actually concatenate with what was passed on previous calls
You can simply put d = hashlib.sha256() inside the loop
import hashlib
def to_hex(s):
  print " ".join(hex(ord(i)) for i in s)
print "Entry:"
r = chr(1)
to_hex(r)
for i in range(2):
  print "Loop", i
  d = hashlib.sha256()
  d.update(r)
  r = d.digest()
  to_hex(r)
to get the same results as you java program
Entry:
0x1
Loop 0
0x4b 0xf5 0x12 0x2f 0x34 0x45 0x54 0xc5 0x3b 0xde 0x2e 0xbb 0x8c 0xd2 0xb7 0xe3 0xd1 0x60 0xa 0xd6 0x31 0xc3 0x85 0xa5 0xd7 0xcc 0xe2 0x3c 0x77 0x85 0x45 0x9a
Loop 1
0x9c 0x12 0xcf 0xdc 0x4 0xc7 0x45 0x84 0xd7 0x87 0xac 0x3d 0x23 0x77 0x21 0x32 0xc1 0x85 0x24 0xbc 0x7a 0xb2 0x8d 0xec 0x42 0x19 0xb8 0xfc 0x5b 0x42 0x5f 0x70
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With