Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to find and remove the invisible characters in text file using emacs

Tags:

emacs

I have a txt file named COPYING which is edited on windows. It contains windows style eol

$ file COPYING 
COPYING: ASCII English text, with CRLF line terminators

I tried to convert it to unix style using the dos2unix. The below is the output

$ dos2unix COPYING 
dos2unix: Skipping binary file COPYING

I was surprised to find that dos2unix program reports it as an binary file. Then using some other editor(not emacs) i found that the file contains a control character. I am interested to find all the invisible characters in the file using emacs.

By Googling i have found the following solution which uses tr

tr -cd '\11\12\40-\176' < file_name

How to do the same in emacs way. I tried the hexl-mode. The hexl-mode shows text and their corresponding ascii values in a single buffer which is great. How to find the characters which has ASCII values other than 11-12, 40-176(i.e tab, space and visible characters). I tried to create a regular expression for that search, but it is quite complicated.

like image 525
Talespin_Kit Avatar asked Oct 07 '11 12:10

Talespin_Kit


People also ask

How do I find hidden characters in a text file?

Go to View Menu > Select Show Symbol > Select Show All Characters . It displays all hidden characters in the opened file.

How do you identify an invisible character?

They use invisible text or invisible character to represent an empty space without using space key. Whitespace character does not appear on the screen. It is usually a blank Unicode character or text type such as U+0020, U+00A0, U+FEFF, etc.

How do I get rid of M in Emacs?

That is, at the prompt for what to replace, use Control + q then Control + m , then Enter . At the prompt for what to replace it with, just hit Enter (replace it with nothing).


1 Answers

To see invisible characters, you can try whitespace-mode. Spaces and tabs will be displayed with a symbol in a different face. If the coding system is automatically being detected as dos (showing (DOS) on the status bar), carriage returns at the end of a line will be hidden as well. Run revert-buffer-with-coding-system to switch it to Unix or binary (e.g. C-x RET r unix) and they'll always show up as ^M. The binary coding system will display any non-ASCII characters as control characters as well.

like image 148
ataylor Avatar answered Sep 17 '22 11:09

ataylor