Is there a way to remove the BOM from a UTF-8 encoded file?
I know that all of my JSON files are encoded in UTF-8, but the data entry person who edited the JSON files saved it as UTF-8 with the BOM.
When I run my Ruby scripts to parse the JSON, it is failing with an error. I don't want to manually open 58+ JSON files and convert to UTF-8 without the BOM.
The UTF-8 encoding without a BOM has the property that a document which contains only characters from the US-ASCII range is encoded byte-for-byte the same way as the same document encoded using the US-ASCII encoding. Such a document can be processed and understood when encoded either as UTF-8 or as US-ASCII.
If you want to remove the byte order mark from a source code, you need a text editor that offers the option of saving the mark. You read the file with the BOM into the software, then save it again without the BOM and thereby convert the coding. The mark should then no longer appear.
With ruby >= 1.9.2 you can use the mode r:bom|utf-8
This should work (I haven't test it in combination with json):
json = nil #define the variable outside the block to keep the data File.open('file.txt', "r:bom|utf-8"){|file| json = JSON.parse(file.read) }
It doesn't matter, if the BOM is available in the file or not.
Andrew remarked, that File#rewind
can't be used with BOM.
If you need a rewind-function you must remember the position and replace rewind
with pos=
:
#Prepare test file File.open('file.txt', "w:utf-8"){|f| f << "\xEF\xBB\xBF" #add BOM f << 'some content' } #Read file and skip BOM if available File.open('file.txt', "r:bom|utf-8"){|f| pos =f.pos p content = f.read #read and write file content f.pos = pos #f.rewind goes to pos 0 p content = f.read #(re)read and write file content }
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With