In a Rails app, I'm accepting and parsing CSV files that may come formatted with any of three possible line termination characters: \n
(LF
), \r\n
(CR+LF
), or \r
(CR
). Ruby's File
and CSV
libraries seem to handle the first two cases just fine, but the last case ("Mac classic" \r
line endings) isn't handled as a newline. It's important to be able to accept this format as well as the others, since Microsoft Excel for Mac (running on OS X) seems to use it when exporting to "Comma Separated Values" (although exporting to "Windows Comma Separated" produces the easier-to-handle \r\n
).
Python has "universal newline support" and will handle any of these three formats without a problem. Is there something similar in Ruby that will accept all three without knowing the format in advance?
You could use :row_sep => :auto
:
:row_sep
The String appended to the end of each row. This can be set to the special:auto
setting, which requests that CSV automatically discover this from the data. Auto-discovery reads ahead in the data looking for the next"\r\n"
,"\n"
, or"\r"
sequence.
There are some caveats of course, see the manual linked to above for details.
You could also manually clean up the EOLs with a bit of gsub
ing before handing the data to CSV for parsing. I'd probably take this route and manually convert all \r\n
s and \r
s to single \n
s before attempting to parse the CSV. OTOH, this won't work that well if there is embedded binary data in your CSV where \r
s mean something. On the gripping hand, this is CSV we're dealing with so who knows what sort of crazy broken nonsense you'll end up dealing with.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With