I have some working code with a crutch to add BOM marker to a new file.
#writing
File.open name, 'w', 0644 do |file|
file.write "\uFEFF"
file.write @data
end
#reading
File.open name, 'r:bom|utf-8' do |file|
file.read
end
Is there any way to automatically add the marker without writing cryptic "\uFEFF"
before the data? Something like File.open name, 'w:bom' # this mode has no effect
maybe?
**** This answer lead to a new gem: file_with_bom ****
I had the similar problem in the past and I extended File.open
with additional encoding variants for the w
-mode:
class File
BOM_LIST_hex = {
Encoding::UTF_8 => "\xEF\xBB\xBF", #"\uEFBBBF"
Encoding::UTF_16BE => "\xFE\xFF", #"\uFEFF",
Encoding::UTF_16LE => "\xFF\xFE",
Encoding::UTF_32BE => "\x00\x00\xFE\xFF",
Encoding::UTF_32LE => "\xFE\xFF\x00\x00",
}
BOM_LIST_hex.freeze
def utf_bom_hex(encoding = external_encoding)
BOM_LIST_hex[encoding]
end
class << self
alias :open_old :open
def open(filename, mode_string = 'r', options = {}, &block)
#check for bom-flag in mode_string
options[:bom] = true if mode_string.sub!(/-bom/i,'')
f = open_old(filename, mode_string, options)
if options[:bom]
case mode_string
#r|bom already standard since 1.9.2
when /\Ar/ #read mode -> remove BOM
#remove BOM
bom = f.read(f.utf_bom_hex.bytesize)
#check, if it was really a bom
if bom != f.utf_bom_hex.force_encoding(bom.encoding)
f.rewind #return to position 0 if BOM was no BOM
end
when /\Aw/ #write mode -> attach BOM
f = open_old(filename, mode_string, options)
f << f.utf_bom_hex.force_encoding(f.external_encoding)
end #mode_string
end
if block_given?
yield f
f.close
end
end
end
end #File
Testcode:
EXAMPLE_TEXT = 'some content öäü'
File.open("file_utf16le.txt", "w:utf-16le|bom"){|f| f << EXAMPLE_TEXT }
File.open("file_utf16le.txt", "r:utf-16le|bom:utf-8"){|f| p f.read }
File.open("file_utf16le.txt", "r:utf-16le:utf-8", :bom => true ){|f| p f.read }
File.open("file_utf16le.txt", "r:utf-16le:utf-8"){|f| p f.read }
File.open("file_utf8.txt", "w:utf-8", :bom => true ){|f| f << EXAMPLE_TEXT }
File.open("file_utf8.txt", "r:utf-8", :bom => true ){|f| p f.read }
File.open("file_utf8.txt", "r:utf-8|bom", ){|f| p f.read }
File.open("file_utf8.txt", "r:utf-8", ){|f| p f.read }
Some remarks:
-bom
as a bom indicator (ruby 1.9 uses |bom
.Some needed fixes to be better:
|bom
instead -bom
r|bom
for readingPerhaps I will find some time tomorrow to refactor my code and provide it as a gem.
Alas I think your manual approach is the way to go, at least I don't know a better way:
http://blog.grayproductions.net/articles/miscellaneous_m17n_details
To quote from JEG2's article:
Ruby 1.9 won't automatically add a BOM to your data, so you're going to need to take care of that if you want one. Luckily, it's not too tough. The basic idea is just to print the bytes needed at the beginning of a file.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With