Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I know whether a character to a given language? In Unicode string [duplicate]

Tags:

string

c#

unicode

Possible Duplicate:
Return the language of a given string

The task is to sort the list of strings. With priority to a specific language. Strings can be written in different languages. Such as Chinese, English, Russian. And I need to first take all the Chinese, and then the rest.

To do this, I want to know what country (language) belongs to a particular character in a string. ( For example on the first letter of)

Are there any classes or methods?

like image 863
Mixer Avatar asked Jan 25 '13 17:01

Mixer


2 Answers

If we're talking alphabets, then you can simply check the int representation of a char by casting it:

int unicodeValue = (int)myString[0];

Then using a table such as this one you check if it's within the limit of a language.
For example, is 19984, which is 4E10 in hexadecimal (19984.ToString("X")), making it a CJK Unified Ideographs. It looks like this it's the category for chinese characters, but you need to dig around and make sure.

Now if we're talking about determining which language is a particular word from, you need to look into Soundex algorithms.

like image 105
Louis Kottmann Avatar answered Oct 15 '22 10:10

Louis Kottmann


Try this link

How to detect the language of a string?

Code is(Copied)

var text = "¿Dónde está el baño?";
google.language.detect(text, function(result) {
if (!result.error) {
var language = 'unknown';
for (l in google.language.Languages) {
  if (google.language.Languages[l] == result.language) {
    language = l;
    break;
  }
}
var container = document.getElementById("detection");
container.innerHTML = text + " is: " + language + "";
}
});
like image 24
D J Avatar answered Oct 15 '22 10:10

D J