In my app (Rails 3.0.5, Ruby 1.8.7), I created an import tool to import CSV data from file.
Problem: I asked my users to export the CSV file from Excel in UTF-8 encoding but they don't do it most of time.
How can I just verify if the file is UTF-8 before importing ? Else the import will run but give strange results. I use FasterCSV to import.
Exemple of bad CSV file:
;VallÈe du RhÙne;CÙte Rotie;
Thanks.
You can use Charlock Holmes, a character encoding detecting library for Ruby.
https://github.com/brianmario/charlock_holmes
To use it, you just read the file, and use the detect
method.
contents = File.read('test.xml')
detection = CharlockHolmes::EncodingDetector.detect(contents)
# => {:encoding => 'UTF-8', :confidence => 100, :type => :text}
You can also convert the encoding to UTF-8 if it is not in the correct format:
utf8_encoded_content = CharlockHolmes::Converter.convert contents, detection[:encoding], 'UTF-8'
This saves users from having to do it themselves before uploading it again.
For 1.9 it's obvious, you just tell it to expect utf8 and it will raise an error if it isn't:
begin
lines = CSV.read('bad.csv', :encoding => 'utf-8')
rescue ArgumentError
puts "My users don't listen to me!"
end
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With