When I cat a file in bash I get the following:
$ cat /tmp/file
microsoft
When I view the same file in vim I get the following:
^@m^@i^@c^@r^@o^@s^@o^@f^@t^@
How can I identify and remove these "non-printable" characters. What does '^@' mean in vim??
(Just a piece of background information: the file was created by base 64 decoding and cutting from the pssh header of an mpd file for Microsoft Playready)
To delete one character, position the cursor over the character to be deleted and type x . The x command also deletes the space the character occupied—when a letter is removed from the middle of a word, the remaining letters will close up, leaving no gap.
Like most of the capitalized movement pairs, b moves by word, but B moves by WORD. The difference is that vim considers a "word" to be letters, numbers, and underscores (and you can configure this with the iskeyword setting), but a "WORD" is always anything that isn't whitespace.
Press ESC to go to Normal mode. Place the cursor on the line you need to delete. Press dd . This will delete the current line.
What you see is Vim's visual representation of unprintable characters. It is explained at :help 'isprint'
:
Non-printable characters are displayed with two characters: 0 - 31 "^@" - "^_" 32 - 126 always single characters 127 "^?" 128 - 159 "~@" - "~_" 160 - 254 "| " - "|~" 255 "~?"
Therefore, ^@
stands for a null byte = 0x00. These (and other non-printable characters) can come from various sources, but in your case it's an ...
If you clearly observe your output in Vim, every second byte is a null byte; in between are the expected characters. This is a clear indication that the file uses a multibyte encoding (utf-16
, big endian, no byte order mark to be precise), and Vim did not properly detect that, and instead opened the file as latin1
or so (whereas things worked out properly in the terminal).
To fix this, you can either explicitly specify the encoding:
:edit ++enc=utf-16 /tmp/file
Or tweak the 'fileencodings'
option, so that Vim can automatically detect this. However, be aware that ambiguities (as in your case) make this prone to fail:
For an empty file or a file with only ASCII characters most encodings will work and the first entry of 'fileencodings' will be used (except "ucs-bom", which requires the BOM to be present).
That's why a byte order mark (BOM) is recommended for 16-bit encodings; but that assumes that you have control over the output encoding.
^@
is Vim's representation of a null byte. The ^
indicates a non-printable control character, with the following ASCII character indicating
which control character it is.
^@ == 0 (NUL)
^A == 1
^B == 2
...
^H == 8
^K == 11
...
^Z == 26
^[ == 27
^\ == 28
^] == 29
^^ == 30
^_ == 31
^? == 127
9 and 10 aren't escaped because they are Tab and Line Feed respectively.
32 to 126 are printable ASCII characters (starting with Space).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With