I'm porting a C# script into Spark (Scala) and I'm running into an issue with UUID generation in Scala vs GUID generation in C#.
Is there any way to generate a UUID in Java that is identical to that of the one generated in C#?
I'm generating the primary key for a database by creating a Guid from the MD5 hash of a string. Ultimately, I'd like to generate UUIDs in Java/Scala that match those from the C# script, so the existing data in the database that used the C# implementation for hashing doesn't need to be rehashed.
C# to port:
String ex = "Hello World";
Console.WriteLine("String to Hash: {0}", ex);
byte[] md5 = GetMD5Hash(ex);
Console.WriteLine("Hash: {0}", BitConverter.ToString(md5));
Guid guid = new Guid(md5);
Console.WriteLine("Guid: {0}", guid);
private static byte[] GetMD5Hash(params object[] values) {
using (MD5 md5 = MD5.Create())
return md5.ComputeHash(Encoding.UTF8.GetBytes(s));
}
Scala ported code:
val to_encode = "Hello World"
val md5hash = MessageDigest.getInstance("MD5")
.digest(to_encode.trim().getBytes())
val md5string = md5hash.map("%02x-".format(_)).mkString
val uuid_bytes = UUID.nameUUIDFromBytes(to_encode.trim().getBytes())
printf("String to encode: %s\n", to_encode)
printf("MD5: %s\n", md5string)
printf("UUID: %s\n", uuid_bytes.toString)
Result from C#
Result from Scala
What works:
What doesn't:
Short of manipulating bytes, is there any other way to fix this?
If you want your C# and your Java to act exactly the same way (and you are happy with the existing C# behaviour), you'll need to manually re-order some of the bytes in uuid_bytes
(i.e. swap some of the entries you identified as out of order).
Additionally, you should not use:
UUID.nameUUIDFromBytes(to_encode.trim().getBytes())
But instead use:
public static String getGuidFromByteArray(byte[] bytes) {
ByteBuffer bb = ByteBuffer.wrap(bytes);
long high = bb.getLong();
long low = bb.getLong();
UUID uuid = new UUID(high, low);
return uuid.toString();
}
Shamelessly stolen from https://stackoverflow.com/a/24409153/34092 :)
In case you weren't aware, when dealing with C#'s GUIDs:
Note that the order of bytes in the returned byte array is different from the string representation of a Guid value. The order of the beginning four-byte group and the next two two-byte groups is reversed, whereas the order of the last two-byte group and the closing six-byte group is the same. The example provides an illustration.
And:
The order of hexadecimal strings returned by the ToString method depends on whether the computer architecture is little-endian or big-endian.
In your C#, rather than using:
Console.WriteLine("Guid: {0}", guid);
you may want to consider using:
Console.WriteLine(BitConverter.ToString(guid.ToByteArray()));
Your existing code calls ToString
behind the scenes. Alas, ToString
and ToByteArray
do not return the bytes in the same order.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With