Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Trying to delete non-ASCII characters only [duplicate]

Tags:

regex

linux

sed

tr

I am trying to manipulate a text file and remove non-ASCII characters from the text. I don't want to remove the line. I only want to remove the offending characters. I am trying to get the following expression to work:

sed '/[\x80-\xFF]/d'

like image 455
M_x_r Avatar asked Feb 22 '13 23:02

M_x_r


People also ask

How do you remove non ASCII characters?

Using. Bring out the command palette with CTRL+SHIFT+P (Windows, Linux) or CMD+SHIFT+P on Mac. Type Remove Non Ascii Chars until you see the commands. Select Remove non Ascii characters (File) for removing in the entire file, or Remove non Ascii characters (Select) for removing only in the selected text.

How do I remove non ASCII characters in Excel?

Removing Non-Printable, Non-ASCII Characters One way to remove such data is to have the SUBSTITUTE function convert it into an ASCII character that the CLEAN function can remove. You can nest the SUBSTITUTE and CLEAN functions to make it easier.


1 Answers

The suggested solutions may fail with specific version of sed, e.g. GNU sed 4.2.1.

Using tr:

tr -cd '[:print:]' < yourfile.txt

This will remove any characters not in [\x20-\x7e].

If you want to keep e.g. line feeds, just add \n:

tr -cd '[:print:]\n' < yourfile.txt

If you really want to keep all ASCII characters (even the control codes):

tr -cd '[:print:][:cntrl:]' < yourfile.txt

This will remove any characters not in [\x00-\x7f].

like image 161
speakr Avatar answered Oct 02 '22 08:10

speakr