How to get vim to show a byte-by-byte representation of file data

Question

I don't want vim to ever interpret my data in any encoding specific way. In other words, when I'm in vim, I want the character that my cursor is on to correspond to the actual byte, not a utf* (etc.) representation of that byte.

I need to use vim to analyze issues caused by Unicode conversion errors made by other people (using other software) so it's important that I see what is actually there.

For example, in Cygwin's vim, I have been able to see UTF-8 BOMs as

ï»¿ [START OF FILE DATA]

This is perfect. I recognize this as a UTF-8 BOM and if I want to know what the hex for each character is, I can put the cursor on the characters and use 'ga'.

I recently got a proper Linux machine (Fedora). In /etc/vimrc, this line exists

set fileencodings=ucs-bom,utf-8,latin1

When I look at a UTF-8 BOM on this machine, the BOM is completely hidden.

When I add the following line to ~/.vimrc

set fileencodings=latin1

I see

Ã¯Â»Â¿

The first 3 characters are the BOM (when ga is used against them). I don't know what the last 3 characters are.

At one point, I even saw the UTF-8 BOM represented as "feff" - the UTF-16 BOM.

Anyway, you see my problem. I need to see exactly what is in my file without vim interpreting the bytes for me. I know I could use xxd, od, etc but vim has always been very convenient as an analysis tool. Plus I want to be able to edit the files and save them without any conversion problems.

Thanks for your help.

Ingo Karkat · Accepted Answer

Use 'binary' mode:

:edit ++bin file

or

vim -b file

From :help 'binary':

The 'fileencoding' and 'fileencodings' options will not be used, the file is read without conversion.

Mark Tolonen · Answer

The sequence Ã¯Â»Â¿ is actually the U+FEFF (BOM) encoded UTF-8, decoded latin1, encoded UTF-8, and decoded latin1 again. ï»¿ is the U+FEFF (BOM) encoded as UTF-8 and decoded as latin1. You can't get away from encodings. Those aren't the actual bytes, they are the latin1 characters displayed from an incorrect decoding. If you want bytes, use a hex editor; otherwise, use the correct decoding.

Алексей Киричун · Answer

I get some good mileage from doing :e ++enc=latin1 after loading the file (VIm's initial guess on the encoding isn't important at this stage).

How to get vim to show a byte-by-byte representation of file data

Tags:

vim

unicode

utf-8

hex-editors

Jesse Hogan

3 Answers

Ingo Karkat

Mark Tolonen

Алексей Киричун

Recent Activity

Donate For Us

How to get vim to show a byte-by-byte representation of file data

Tags:

vim

unicode

utf-8

hex-editors

Jesse Hogan

3 Answers

Ingo Karkat

Mark Tolonen

Алексей Киричун

Related questions

Recent Activity

Donate For Us