Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is there a way to remove the BOM from a UTF-8 encoded file?

Is there a way to remove the BOM from a UTF-8 encoded file?

I know that all of my JSON files are encoded in UTF-8, but the data entry person who edited the JSON files saved it as UTF-8 with the BOM.

When I run my Ruby scripts to parse the JSON, it is failing with an error. I don't want to manually open 58+ JSON files and convert to UTF-8 without the BOM.

like image 993
Abe Avatar asked Feb 16 '11 01:02

Abe


People also ask

What is UTF-8 without BOM?

The UTF-8 encoding without a BOM has the property that a document which contains only characters from the US-ASCII range is encoded byte-for-byte the same way as the same document encoded using the US-ASCII encoding. Such a document can be processed and understood when encoded either as UTF-8 or as US-ASCII.

How do I remove byte order mark from a file?

If you want to remove the byte order mark from a source code, you need a text editor that offers the option of saving the mark. You read the file with the BOM into the software, then save it again without the BOM and thereby convert the coding. The mark should then no longer appear.


1 Answers

With ruby >= 1.9.2 you can use the mode r:bom|utf-8

This should work (I haven't test it in combination with json):

json = nil #define the variable outside the block to keep the data File.open('file.txt', "r:bom|utf-8"){|file|   json = JSON.parse(file.read) } 

It doesn't matter, if the BOM is available in the file or not.


Andrew remarked, that File#rewind can't be used with BOM.

If you need a rewind-function you must remember the position and replace rewind with pos=:

#Prepare test file File.open('file.txt', "w:utf-8"){|f|   f << "\xEF\xBB\xBF" #add BOM   f << 'some content' }  #Read file and skip BOM if available File.open('file.txt', "r:bom|utf-8"){|f|   pos =f.pos   p content = f.read  #read and write file content   f.pos = pos   #f.rewind  goes to pos 0   p content = f.read  #(re)read and write file content } 
like image 50
knut Avatar answered Sep 30 '22 12:09

knut