Ruby 2.6.3.
I have been trying to parse a StringIO
object into a CSV
instance with the bom|utf-8
encoding, so that the BOM character (undesired) is stripped and the content is encoded to UTF-8:
require 'csv'
CSV_READ_OPTIONS = { headers: true, encoding: 'bom|utf-8' }.freeze
content = StringIO.new("\xEF\xBB\xBFid\n123")
first_row = CSV.parse(content, CSV_READ_OPTIONS).first
first_row.headers.first.include?("\xEF\xBB\xBF") # This returns true
Apparently the bom|utf-8
encoding does not work for StringIO
objects, but I found that it does work for files, for instance:
require 'csv'
CSV_READ_OPTIONS = { headers: true, encoding: 'bom|utf-8' }.freeze
# File content is: "\xEF\xBB\xBFid\n12"
first_row = CSV.read('bom_content.csv', CSV_READ_OPTIONS).first
first_row.headers.first.include?("\xEF\xBB\xBF") # This returns false
Considering that I need to work with StringIO
directly, why does CSV
ignores the bom|utf-8
encoding? Is there any way to remove the BOM character from the StringIO
instance?
Thank you!
Ruby 2.7 added the set_encoding_by_bom
method to IO
. This methods consumes the byte order mark and sets the encoding.
require 'csv'
require 'stringio'
CSV_READ_OPTIONS = { headers: true }.freeze
content = StringIO.new("\xEF\xBB\xBFid\n123")
content.set_encoding_by_bom
first_row = CSV.parse(content, CSV_READ_OPTIONS).first
first_row.headers.first.include?("\xEF\xBB\xBF")
#=> false
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With