I am trying to manipulate a text file and remove non-ASCII characters from the text. I don't want to remove the line. I only want to remove the offending characters. I am trying to get the following expression to work:
sed '/[\x80-\xFF]/d'
Using. Bring out the command palette with CTRL+SHIFT+P (Windows, Linux) or CMD+SHIFT+P on Mac. Type Remove Non Ascii Chars until you see the commands. Select Remove non Ascii characters (File) for removing in the entire file, or Remove non Ascii characters (Select) for removing only in the selected text.
Removing Non-Printable, Non-ASCII Characters One way to remove such data is to have the SUBSTITUTE function convert it into an ASCII character that the CLEAN function can remove. You can nest the SUBSTITUTE and CLEAN functions to make it easier.
The suggested solutions may fail with specific version of sed, e.g. GNU sed 4.2.1.
Using tr
:
tr -cd '[:print:]' < yourfile.txt
This will remove any characters not in [\x20-\x7e]
.
If you want to keep e.g. line feeds, just add \n
:
tr -cd '[:print:]\n' < yourfile.txt
If you really want to keep all ASCII characters (even the control codes):
tr -cd '[:print:][:cntrl:]' < yourfile.txt
This will remove any characters not in [\x00-\x7f]
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With