Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Reading a csv file with embedded quotes into R

I have to work with a .csv file that comes like this:

"IDEA ID,""IDEA TITLE"",""VOTE VALUE"""
"56144,""Net Present Value PLUS (NPV+)"",1"
"56144,""Net Present Value PLUS (NPV+)"",1"

If I use read.csv, I obtain a data frame with one variable. What I need is a data frame with three columns, where columns are separated by commas. How can I handle the quotes at the beginning of the line and the end of the line?

like image 467
user3819143 Avatar asked Feb 03 '26 13:02

user3819143


1 Answers

I don't think there's going to be an easy way to do this without stripping the initial and terminal quotation marks first. If you have sed on your system (Unix [Linux/MacOS] or Windows+Cygwin?) then

read.csv(pipe("sed -e 's/^\"//' -e 's/\"$//' qtest.csv"))

should work. Otherwise

read.csv(text=gsub("(^\"|\"$)","",readLines("qtest.csv")))

is a little less efficient for big files (you have to read in the whole thing before processing it), but should work anywhere.

(There may be a way to do the regular expression for sed in the same, more-compact form using parentheses that the second example uses, but I got tired of trying to sort out where all the backslashes belonged.)

like image 166
Ben Bolker Avatar answered Feb 05 '26 03:02

Ben Bolker



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!