Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to get vim to show a byte-by-byte representation of file data

I don't want vim to ever interpret my data in any encoding specific way. In other words, when I'm in vim, I want the character that my cursor is on to correspond to the actual byte, not a utf* (etc.) representation of that byte.

I need to use vim to analyze issues caused by Unicode conversion errors made by other people (using other software) so it's important that I see what is actually there.

For example, in Cygwin's vim, I have been able to see UTF-8 BOMs as

 [START OF FILE DATA]

This is perfect. I recognize this as a UTF-8 BOM and if I want to know what the hex for each character is, I can put the cursor on the characters and use 'ga'.

I recently got a proper Linux machine (Fedora). In /etc/vimrc, this line exists

set fileencodings=ucs-bom,utf-8,latin1

When I look at a UTF-8 BOM on this machine, the BOM is completely hidden.

When I add the following line to ~/.vimrc

set fileencodings=latin1

I see



The first 3 characters are the BOM (when ga is used against them). I don't know what the last 3 characters are.

At one point, I even saw the UTF-8 BOM represented as "feff" - the UTF-16 BOM.

Anyway, you see my problem. I need to see exactly what is in my file without vim interpreting the bytes for me. I know I could use xxd, od, etc but vim has always been very convenient as an analysis tool. Plus I want to be able to edit the files and save them without any conversion problems.

Thanks for your help.

like image 768
Jesse Hogan Avatar asked Aug 31 '12 17:08

Jesse Hogan


3 Answers

Use 'binary' mode:

:edit ++bin file

or

vim -b file

From :help 'binary':

The 'fileencoding' and 'fileencodings' options will not be used, the file is read without conversion.

like image 140
Ingo Karkat Avatar answered Nov 10 '22 13:11

Ingo Karkat


The sequence  is actually the U+FEFF (BOM) encoded UTF-8, decoded latin1, encoded UTF-8, and decoded latin1 again.  is the U+FEFF (BOM) encoded as UTF-8 and decoded as latin1. You can't get away from encodings. Those aren't the actual bytes, they are the latin1 characters displayed from an incorrect decoding. If you want bytes, use a hex editor; otherwise, use the correct decoding.

like image 25
Mark Tolonen Avatar answered Nov 10 '22 12:11

Mark Tolonen


I get some good mileage from doing :e ++enc=latin1 after loading the file (VIm's initial guess on the encoding isn't important at this stage).

like image 7
Алексей Киричун Avatar answered Nov 10 '22 12:11

Алексей Киричун