I have the following example code:
String n = "Péña";
n = Normalizer.normalize(n, Normalizer.Form.NFC);
How do I normalize the string n
excepting the ñ
?
And not only that string, I'm making a form and I want to keep just the ñ's
, and everything else without diacritics.
Replace all occurrences of "ñ" with a non-printable character "\001", so "Péña" becomes "Pé\001a". Then call Normalizer.normalize()
to decompose the "é" into "e" and a separate diacritical mark. Finally remove the diacritical marks, and convert the non-printable character back to "ñ".
String partiallyNormalize(String string)
{
string = string.replace('ñ', '\001');
string = Normalizer.normalize(string, Normalizer.Form.NFD);
string = string.replaceAll("[\\p{InCombiningDiacriticalMarks}]", "");
string = string.replace('\001', 'ñ');
return string;
}
You might also want to upvote the preferred answer to Easy way to remove UTF-8 accents from a string?, where I learned how to remove the diacritical marks.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With