Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to pre-process CSV data for FasterCSV?

We're having a significant number of problems creating a bulk upload function for our little app. We're using the FasterCSV gem to upload data to a MySQL database but he Faster CSV is so twitchy and precise in its requirements that it constantly breaks with malformed CSV errors and time out errors.

The csv files are generally created by users' pasting text from their web sites or from Microsoft Word docs so it is not reasonable to expect that there will never be odd characters like smart quotes or accents in the data. Also users aren't going to be readily able to identify whether their data is perfect enough for FasterCSV or not. We need to find a way to fix it for them automatically.

Is there a good way or a reliable tool for pre-processing CSV data to fix any nits in the data before having the FasterCSV gem process it?

like image 743
Katherine Chalmers Avatar asked Mar 09 '10 19:03

Katherine Chalmers


3 Answers

Try the CSV library in the standard lib. It is more forgiving about malformed CSV: http://ruby-doc.org/stdlib/libdoc/csv/rdoc/index.html

like image 50
derfred Avatar answered Oct 24 '22 19:10

derfred


You can pass the file's encoding type into the FasterCSV options when creating a new instance of the FasterCsv parser. (see docs here: http://fastercsv.rubyforge.org/classes/FasterCSV.html#M000018)

Setting it to utf-8 or the Microsoft encoding should get it past most dodgy extra characters, allowing it to actually parse into your required strings... then you can clean the strings to your heart's content.

There's also something in the docs about "converters" that you can pass in - though this is aimed more at converting, say, numeric or date types, you ight be able to use it to gsub for the dodgy chars.

like image 20
Taryn East Avatar answered Oct 24 '22 19:10

Taryn East


Try the smarter_csv Gem - you can pass a block to it's proces method and clean-up data before it is used

https://github.com/tilo/smarter_csv

like image 1
Tilo Avatar answered Oct 24 '22 18:10

Tilo