Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can non-ASCII characters be removed from a string?

I have strings "A função", "Ãugent" in which I need to replace characters like ç, ã, and à with empty strings.

How can I remove those non-ASCII characters from my string?

I have attempted to implement this using the following function, but it is not working properly. One problem is that the unwanted characters are getting replaced by the space character.

public static String matchAndReplaceNonEnglishChar(String tmpsrcdta) {     String newsrcdta = null;     char array[] = Arrays.stringToCharArray(tmpsrcdta);     if (array == null)         return newsrcdta;      for (int i = 0; i < array.length; i++) {         int nVal = (int) array[i];         boolean bISO =                 // Is character ISO control                 Character.isISOControl(array[i]);         boolean bIgnorable =                 // Is Ignorable identifier                 Character.isIdentifierIgnorable(array[i]);         // Remove tab and other unwanted characters..         if (nVal == 9 || bISO || bIgnorable)             array[i] = ' ';         else if (nVal > 255)             array[i] = ' ';     }     newsrcdta = Arrays.charArrayToString(array);      return newsrcdta; } 
like image 249
rahulsri Avatar asked Dec 15 '11 11:12

rahulsri


People also ask

How do I remove non ASCII characters from a string in Python?

In python, to remove non-ASCII characters in python, we need to use string. encode() with encoding as ASCII and error as ignore, to returns a string without ASCII character use string. decode().

How do I remove non ASCII characters in Excel?

Step 1: Click on any cell (D3). Enter Formula =CLEAN(C3). Step 2: Click ENTER. It removes non-printable characters.

How do I remove non-printable characters from a string?

replaceAll("\\p{Cntrl}", "?"); The following will replace all ASCII non-printable characters (shorthand for [\p{Graph}\x20] ), including accented characters: my_string.


1 Answers

This will search and replace all non ASCII letters:

String resultString = subjectString.replaceAll("[^\\x00-\\x7F]", ""); 
like image 65
FailedDev Avatar answered Sep 30 '22 17:09

FailedDev