Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

String to character array returning different result in Visual Studio and Android Studio

The string that I want to convert into character array is ষ্টোর it is in Unicode and a Bengali word.

The problem is when I am converting it in Visual studio then it is returning 6 characters but when I am converting it in Android Studio then it is showing 5 characters.

In VS I am using char[] arrayOfChars = someString.ToCharArray(); and in Android Studio char[] arrayOfChars = someString.toCharArray();

Visual Studio Debugging info

Android Studio Debugging info

N:B: My Android Studio IDE and Project Encoding is UTF-8. I am expecting same result as Visual Studio in Android Studio.

like image 676
bluetoothfx Avatar asked Apr 04 '17 20:04

bluetoothfx


People also ask

What is@ string in android studio?

A single string that can be referenced from the application or from other resource files (such as an XML layout). Note: A string is a simple resource that is referenced using the value provided in the name attribute (not the name of the XML file).

What is ToCharArray in c#?

In C#, ToCharArray() is a string method. This method is used to copy the characters from a specified string in the current instance to a Unicode character array or the characters of a specified substring in the current instance to a Unicode character array.

How to write special characters in string XML in android?

How can I write character & in the strings. xml? In android studio, you can simply press Alt+Enter and it will convert for you.


1 Answers

Those two arrays are unicode equivalent, but are being represented by different normalization forms. What seems to be happening is that the Java ToCharArray (or string representation) is using one normalization form, while the C# ToCharArray (or string representation) is using another.

This page contains a chart of different normalization forms for Bengali text - the fourth row there describes exactly what you're seeing:

Bengali table

I am only learning about this now, but it seems to me that the motivation for this is so that unicode implementations could remain compatible with pre-existing encodings wherever possible and practical.

For example, one pre-existing encoding may have used a single unicode character, while another pre-existing encoding may have instead used two characters combined. The solution settled on by the unicode folks is thus to support both, at the cost of not having a single "canonical" representation, as you've encountered here.

If you wish for your Java array to be normalized under the "D" normalization form that your C# array seems to be using, it appears that this page provides such a function. You may be looking for something like:

someString = Normalizer.normalize(someString, Normalizer.Form.NFD);

Unicode standard annex 15 is the official document that describes these normalization forms.

like image 141
Jeremy Avatar answered Oct 19 '22 06:10

Jeremy