I am trying to read a String in UTF-16 encoding scheme and perform MD5 hashing on it. But strangely, Java and C# are returning different results when I try to do it. The following is the piece of code in Java: <pre class="prettyprint"><code>public static void main(String[] args) { String str = "preparar mantecado con coca cola"; try { MessageDigest digest = MessageDigest.getInstance("MD5"); digest.update(str.getBytes("UTF-16")); byte[] hash = digest.digest(); String output = ""; for(byte b: hash){ output += Integer.toString( ( b & 0xff ) + 0x100, 16).substring( 1 ); } System.out.println(output); } catch (Exception e) { } } </code></pre> The output for this is: 249ece65145dca34ed310445758e5504 The following is the piece of code in C#: <pre class="prettyprint"><code> public static string GetMD5Hash() { string input = "preparar mantecado con coca cola"; System.Security.Cryptography.MD5CryptoServiceProvider x = new System.Security.Cryptography.MD5CryptoServiceProvider(); byte[] bs = System.Text.Encoding.Unicode.GetBytes(input); bs = x.ComputeHash(bs); System.Text.StringBuilder s = new System.Text.StringBuilder(); foreach (byte b in bs) { s.Append(b.ToString("x2").ToLower()); } string output= s.ToString(); Console.WriteLine(output); } </code></pre> The output for this is: c04d0f518ba2555977fa1ed7f93ae2b3 I am not sure, why the outputs are not the same. How do we change the above piece of code, so that both of them return the same output?

UTF-16 != UTF-16. In Java, <code>getBytes("UTF-16")</code> returns an a big-endian representation with optional byte-ordering mark. C#'s <code>System.Text.Encoding.Unicode.GetBytes</code> returns a little-endian representation. I can't check your code from here, but I think you'll need to specify the conversion precisely. Try <code>getBytes("UTF-16LE")</code> in the Java version.

UTF-16 Encoding in Java versus C#

I am trying to read a String in UTF-16 encoding scheme and perform MD5 hashing on it. But strangely, Java and C# are returning different results when I try to do it.

The following is the piece of code in Java:

public static void main(String[] args) {
    String str = "preparar mantecado con coca cola";
    try {
        MessageDigest digest = MessageDigest.getInstance("MD5");
        digest.update(str.getBytes("UTF-16"));
        byte[] hash = digest.digest();
        String output = "";
        for(byte b: hash){
            output += Integer.toString( ( b & 0xff ) + 0x100, 16).substring( 1 );
        }
        System.out.println(output);
    } catch (Exception e) {

    }
}

The output for this is: 249ece65145dca34ed310445758e5504

The following is the piece of code in C#:

   public static string GetMD5Hash()
        {
            string input = "preparar mantecado con coca cola";
            System.Security.Cryptography.MD5CryptoServiceProvider x = new System.Security.Cryptography.MD5CryptoServiceProvider();
            byte[] bs = System.Text.Encoding.Unicode.GetBytes(input);
            bs = x.ComputeHash(bs);
            System.Text.StringBuilder s = new System.Text.StringBuilder();
            foreach (byte b in bs)
            {
                s.Append(b.ToString("x2").ToLower());
            }
            string output= s.ToString();
            Console.WriteLine(output);
        }

The output for this is: c04d0f518ba2555977fa1ed7f93ae2b3

I am not sure, why the outputs are not the same. How do we change the above piece of code, so that both of them return the same output?

Does Java support UTF-16?

The native character encoding of the Java programming language is UTF-16. A charset in the Java platform therefore defines a mapping between sequences of sixteen-bit UTF-16 code units (that is, sequences of chars) and sequences of bytes.

What is UTF-16 in Java?

UTF-16 (16-bit Unicode Transformation Format) is a character encoding capable of encoding all 1,112,064 valid code points of Unicode (in fact this number of code points is dictated by the design of UTF-16). The encoding is variable-length, as code points are encoded with one or two 16-bit code units.

Does UTF-16 support all languages?

UTF-16 supports all languages that are supported by Unicode, though some characters are not actually encoded with 16 bits but with 32 bits. And strictly speaking Unicode does nor support languages but scripts.

Why does Java use UTF-16?

Unicode was originally designed as a fixed-width 16-bit character encoding. The primitive data type char in the Java programming language was intended to take advantage of this design by providing a simple data type that could hold any character.

UTF-16 != UTF-16.

In Java, getBytes("UTF-16") returns an a big-endian representation with optional byte-ordering mark. C#'s System.Text.Encoding.Unicode.GetBytes returns a little-endian representation. I can't check your code from here, but I think you'll need to specify the conversion precisely.

Try getBytes("UTF-16LE") in the Java version.

The first thing I can find, and this might not be the only problem, is that C#'s Encoding.Unicode.GetBytes() is littleendian, while Java's natural byte order is bigendian.

UTF-16 Encoding in Java versus C#

Tags:

java

c#

encoding

md5

utf-16

rkg

People also ask

2 Answers

Nordic Mainframe

Mark McKenna

Recent Activity

Donate For Us

UTF-16 Encoding in Java versus C#

Tags:

java

c#

encoding

md5

utf-16

rkg

People also ask

2 Answers

Nordic Mainframe

Mark McKenna

Related questions

Recent Activity

Donate For Us