Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to conditionally remove quotes in write.csv?

Tags:

r

csv

When using write.csv, one may reduce file sizes significantly (approx 25% for large datasets) by removing quotes using quote=FALSE. However, this may cause read.csv to malfunction if commas are present in your data. For example:

x <- data.frame(a=1:2,b=c("hello,","world"))
dim(x)
[1] 2 2
f <- tempfile()
write.csv(x,f,row.names=FALSE,quote=FALSE)
dim(read.csv(f))
[1] 2 2
read.csv(f)
      a  b
1 hello NA
2 world NA

Observe column name misalignment and a loss of data and addition of spurious data.

Is is possible to remove quotes generally, but maintain them for fields that have commas in the data?

like image 534
James Avatar asked Sep 09 '14 11:09

James


People also ask

How do I write a CSV file without quotes?

If you want to write a data frame as a csv file without quoting values and strings then you can set the quote=FALSE argument when calling write. table or one if its wrappers such as write. csv and write. csv2 .

How do you escape quotes in CSV?

By default, the escape character is a " (double quote) for CSV-formatted files. If you want to use a different escape character, use the ESCAPE clause of COPY , CREATE EXTERNAL TABLE or gpload to declare a different escape character.

How do you handle double quotes and commas in a CSV file?

There's actually a spec for CSV format, RFC 4180 and how to handle commas: Fields containing line breaks (CRLF), double quotes, and commas should be enclosed in double-quotes.


1 Answers

The solution I went with was a combination of the comments of @TimPietzcker and @BenBolker.

quote can be a numeric vector to specify which columns get quoted. While I would have preferred to only quote when needed, this allowed almost the full file-size reduction in my case (also using na="").

commas <- which(sapply(x, function(y) any(grepl(",",y))))
write.csv(x,f,row.names=FALSE,quote=commas)
read.csv(f)
  a      b
1 1 hello,
2 2  world
like image 194
James Avatar answered Sep 30 '22 18:09

James