Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I remove all non-ASCII characters with regex and Notepad++?

This expression will search for non-ASCII values:

[^\x00-\x7F]+

Tick off 'Search Mode = Regular expression', and click Find Next.

Source: Regex any ASCII character


In Notepad++, if you go to menu SearchFind characters in rangeNon-ASCII Characters (128-255) you can then step through the document to each non-ASCII character.

Be sure to tick off "Wrap around" if you want to loop in the document for all non-ASCII characters.

screenshot "Find in Range"


In addition to the answer by ProGM, in case you see characters in boxes like NUL or ACK and want to get rid of them, those are ASCII control characters (0 to 31), you can find them with the following expression and remove them:

[\x00-\x1F]+

In order to remove all non-ASCII AND ASCII control characters, you should remove all characters matching this regex:

[^\x1F-\x7F]+

To remove all non-ASCII characters, you can use following replacement: [^\x00-\x7F]+

Removing non-ASCII

To highlight characters, I recommend using the Mark function in the search window: this highlights non-ASCII characters and put a bookmark in the lines containing one of them

If you want to highlight and put a bookmark on the ASCII characters instead, you can use the regex [\x00-\x7F] to do so.

Highlighting Non-ASCII

Cheers