Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Ruby CSV BOM|UTF-8 encoding for StringIO

Ruby 2.6.3.

I have been trying to parse a StringIO object into a CSV instance with the bom|utf-8 encoding, so that the BOM character (undesired) is stripped and the content is encoded to UTF-8:

require 'csv'

CSV_READ_OPTIONS = { headers: true, encoding: 'bom|utf-8' }.freeze

content = StringIO.new("\xEF\xBB\xBFid\n123")
first_row = CSV.parse(content, CSV_READ_OPTIONS).first

first_row.headers.first.include?("\xEF\xBB\xBF")     # This returns true

Apparently the bom|utf-8 encoding does not work for StringIO objects, but I found that it does work for files, for instance:

require 'csv'

CSV_READ_OPTIONS = { headers: true, encoding: 'bom|utf-8' }.freeze

# File content is: "\xEF\xBB\xBFid\n12"
first_row = CSV.read('bom_content.csv', CSV_READ_OPTIONS).first

first_row.headers.first.include?("\xEF\xBB\xBF")     # This returns false

Considering that I need to work with StringIO directly, why does CSV ignores the bom|utf-8 encoding? Is there any way to remove the BOM character from the StringIO instance?

Thank you!

like image 292
jovannypcg Avatar asked Sep 25 '19 15:09

jovannypcg


1 Answers

Ruby 2.7 added the set_encoding_by_bom method to IO. This methods consumes the byte order mark and sets the encoding.

require 'csv'
require 'stringio'

CSV_READ_OPTIONS = { headers: true }.freeze

content = StringIO.new("\xEF\xBB\xBFid\n123")
content.set_encoding_by_bom

first_row = CSV.parse(content, CSV_READ_OPTIONS).first
first_row.headers.first.include?("\xEF\xBB\xBF")
#=> false
like image 186
3limin4t0r Avatar answered Oct 07 '22 09:10

3limin4t0r