I am trying to parse a CSV file generated from an Excel spreadsheet.
Here is my code
require 'csv' file = File.open("input_file") csv = CSV.parse(file)
But I get this error
ArgumentError: invalid byte sequence in UTF-8
I think the error is because Excel encodes the file into ISO 8859-1 (Latin-1)
and not in UTF-8
Can someone help me with a workaround for this issue, please
Thanks in advance.
Ruby however doesn't know that the original encoding of the file is ISO-8859-1 and will by default interpret it as UTF-8. So, the following operation will result in the infamous “UTF-8 Invalid byte sequence”: The “invalid UTF-8 byte sequence” here is our “Å” (C5) character as it’s not present in UTF-8.
Ruby’s default encoding since 2.0 is UTF-8. This means that Ruby will treat any string you input as an UTF-8 encoded string unless you tell it explicitly that it’s encoded differently. Let’s use the Å character from the introductory diagram to present this problem.
Every character in UTF-8 is a sequence of 1 up to 4 bytes. Apart from UTF-8 there are also other encodings like ISO-8859–1 or Windows-1252 — you may have seen these names before in your programming career. These encodings cover a big set of characters, including special latin characters etc.
You need to tell Ruby that the file is in ISO-8859-1. Change your file open line to this:
file=File.open("input_file", "r:ISO-8859-1")
The second argument tells Ruby to open read only with the encoding ISO-8859-1.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With