Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to check if Unicode character has diacritics in .Net?

I am developing a heuristic for automatic language detection and would like to find out whether the given letter has diacritics (like "Ðàäèî Êóëüòóðà" -- all letters have diacritics). It would be best if I could also get the type of diacritic, if possible.

I browsed through UnicodeCategory enum but didn't find anything that could help me here.

like image 483
Alexander Galkin Avatar asked Feb 19 '12 13:02

Alexander Galkin


People also ask

How do I check if a string contains Unicode characters?

To check if a given String contains only unicode letters, digits or space, we use the isLetterOrDigit() and charAt() methods with decision making statements. The isLetterOrDigit(char ch) method determines whether the specific character (Unicode ch) is either a letter or a digit.

What is Unicode in. net?

Unicode is an international encoding standard for use on various platforms and with various languages and scripts. The Unicode Standard defines over 1.1 million code points. A code point is an integer value that can range from 0 to U+10FFFF (decimal 1,114,111).

How can I remove accents on a string?

We can remove accents from the string by using a Python module called Unidecode. This module consists of a method that takes a Unicode object or string and returns a string without ascents.


1 Answers

One possible way is to normalize it to a form where letters and their diacritics are written as several codepoints. Then check if you have a letter followed by accents.

Adapting from How do I remove diacritics (accents) from a string in .NET?, you can normalize with Normalize(NormalizationForm.FormD) and check for the diacritics with UnicodeCategory.NonSpacingMark.

bool IsLetterWithDiacritics(char c)
{
    var s = c.ToString().Normalize(NormalizationForm.FormD);
    return (s.Length > 1)  &&
           char.IsLetter(s[0]) &&
           s.Skip(1).All(c2 => CharUnicodeInfo.GetUnicodeCategory(c2) == UnicodeCategory.NonSpacingMark);
}
like image 126
CodesInChaos Avatar answered Nov 06 '22 10:11

CodesInChaos