When using write.csv
, one may reduce file sizes significantly (approx 25% for large datasets) by removing quotes using quote=FALSE
. However, this may cause read.csv
to malfunction if commas are present in your data. For example:
x <- data.frame(a=1:2,b=c("hello,","world"))
dim(x)
[1] 2 2
f <- tempfile()
write.csv(x,f,row.names=FALSE,quote=FALSE)
dim(read.csv(f))
[1] 2 2
read.csv(f)
a b
1 hello NA
2 world NA
Observe column name misalignment and a loss of data and addition of spurious data.
Is is possible to remove quotes generally, but maintain them for fields that have commas in the data?
If you want to write a data frame as a csv file without quoting values and strings then you can set the quote=FALSE argument when calling write. table or one if its wrappers such as write. csv and write. csv2 .
By default, the escape character is a " (double quote) for CSV-formatted files. If you want to use a different escape character, use the ESCAPE clause of COPY , CREATE EXTERNAL TABLE or gpload to declare a different escape character.
There's actually a spec for CSV format, RFC 4180 and how to handle commas: Fields containing line breaks (CRLF), double quotes, and commas should be enclosed in double-quotes.
The solution I went with was a combination of the comments of @TimPietzcker and @BenBolker.
quote
can be a numeric vector to specify which columns get quoted. While I would have preferred to only quote when needed, this allowed almost the full file-size reduction in my case (also using na=""
).
commas <- which(sapply(x, function(y) any(grepl(",",y))))
write.csv(x,f,row.names=FALSE,quote=commas)
read.csv(f)
a b
1 1 hello,
2 2 world
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With