Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Easy way to remove accents from a Unicode string? [duplicate]

I want to change this sentence :

Et ça sera sa moitié.

To :

Et ca sera sa moitie.

Is there an easy way to do this in Java, like I would do in Objective-C ?

NSString *str = @"Et ça sera sa moitié."; NSData *data = [str dataUsingEncoding:NSASCIIStringEncoding allowLossyConversion:YES]; NSString *newStr = [[NSString alloc] initWithData:data encoding:NSASCIIStringEncoding]; 
like image 644
Rob Avatar asked Mar 03 '13 20:03

Rob


People also ask

How do you remove accents from strings?

To remove all accents in a string using vanilla JavaScript use the normalize function supplemented by a string replace . The normalize() method returns the Unicode Normalization Form of the string.

How do you change an accented character to a regular character?

replace(/[^a-z0-9]/gi,'') . However a more intuitive solution (at least for the user) would be to replace accented characters with their "plain" equivalent, e.g. turn á , á into a , and ç into c , etc.


Video Answer


2 Answers

Finally, I've solved it by using the Normalizer class.

import java.text.Normalizer;  public static String stripAccents(String s)  {     s = Normalizer.normalize(s, Normalizer.Form.NFD);     s = s.replaceAll("[\\p{InCombiningDiacriticalMarks}]", "");     return s; } 
like image 175
Rob Avatar answered Oct 08 '22 19:10

Rob


Maybe the easiest and safest way is using StringUtils from Apache Commons Lang

StringUtils.stripAccents(String input) 

Removes diacritics (~= accents) from a string. The case will not be altered. For instance, 'à' will be replaced by 'a'. Note that ligatures will be left as is.

StringUtils.stripAccents()

like image 26
Ondrej Bozek Avatar answered Oct 08 '22 21:10

Ondrej Bozek