Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is there any way to generate a UUID in Java that is identical to that of the one generated in C#?

I'm porting a C# script into Spark (Scala) and I'm running into an issue with UUID generation in Scala vs GUID generation in C#.

Is there any way to generate a UUID in Java that is identical to that of the one generated in C#?

I'm generating the primary key for a database by creating a Guid from the MD5 hash of a string. Ultimately, I'd like to generate UUIDs in Java/Scala that match those from the C# script, so the existing data in the database that used the C# implementation for hashing doesn't need to be rehashed.

C# to port:

String ex = "Hello World";
Console.WriteLine("String to Hash: {0}", ex);
byte[] md5 = GetMD5Hash(ex);
Console.WriteLine("Hash: {0}", BitConverter.ToString(md5));
Guid guid = new Guid(md5);
Console.WriteLine("Guid: {0}", guid);

private static byte[] GetMD5Hash(params object[] values) {
  using (MD5 md5 = MD5.Create())
    return md5.ComputeHash(Encoding.UTF8.GetBytes(s));
} 

Scala ported code:

val to_encode = "Hello World"
val md5hash = MessageDigest.getInstance("MD5")
 .digest(to_encode.trim().getBytes())
val md5string = md5hash.map("%02x-".format(_)).mkString
val uuid_bytes = UUID.nameUUIDFromBytes(to_encode.trim().getBytes())
printf("String to encode: %s\n", to_encode)
printf("MD5: %s\n", md5string)
printf("UUID: %s\n", uuid_bytes.toString)

Result from C#

  • String to hash: Hello World
  • MD5: B1-0A-8D-B1-64-E0-75-41-05-B7-A9-9B-E7-2E-3F-E5
  • Guid: b18d0ab1-e064-4175-05b7-a99be72e3fe5

Result from Scala

  • String to hash: Hello World
  • MD5: b10a8db164e0754105b7a99be72e3fe5
  • UUID: b10a8db1-64e0-3541-85b7-a99be72e3fe5

What works:

  • MD5 Hashes (which the GUID and UUID are based off of) match

What doesn't:

  • First three fields have endianness switched in C# (orange)
    • C#'s GUID chooses native byte ordering for the first three fields (4, 2, 2), which in this case is little endian and Big Endian for the last field (8), while Java's UUID uses Big Endian ordering for all four fields; this explains the byte ordering in the first three fields in C#.
  • Fourth and fifth bytes are different (red)
    • Java switches 6-7 bits in order to denote version and variant of UUID, this might explain the differences in bytes 4 and 5. This seems to be the roadblock.
  • I understand that Java uses signed bytes, while C# has unsigned bytes; this might be relevant as well.

Short of manipulating bytes, is there any other way to fix this?

like image 821
Ari Krumbein Avatar asked Jul 26 '17 21:07

Ari Krumbein


1 Answers

TL;DR

If you want your C# and your Java to act exactly the same way (and you are happy with the existing C# behaviour), you'll need to manually re-order some of the bytes in uuid_bytes (i.e. swap some of the entries you identified as out of order).

Additionally, you should not use:

UUID.nameUUIDFromBytes(to_encode.trim().getBytes())

But instead use:

public static String getGuidFromByteArray(byte[] bytes) {
    ByteBuffer bb = ByteBuffer.wrap(bytes);
    long high = bb.getLong();
    long low = bb.getLong();
    UUID uuid = new UUID(high, low);
    return uuid.toString();
}

Shamelessly stolen from https://stackoverflow.com/a/24409153/34092 :)

Additional Background

In case you weren't aware, when dealing with C#'s GUIDs:

Note that the order of bytes in the returned byte array is different from the string representation of a Guid value. The order of the beginning four-byte group and the next two two-byte groups is reversed, whereas the order of the last two-byte group and the closing six-byte group is the same. The example provides an illustration.

And:

The order of hexadecimal strings returned by the ToString method depends on whether the computer architecture is little-endian or big-endian.

In your C#, rather than using:

Console.WriteLine("Guid: {0}", guid);

you may want to consider using:

Console.WriteLine(BitConverter.ToString(guid.ToByteArray()));

Your existing code calls ToString behind the scenes. Alas, ToString and ToByteArray do not return the bytes in the same order.

like image 65
mjwills Avatar answered Oct 09 '22 18:10

mjwills