I have a .csv file with account codes in the form of 00xxxxx and I need them to stay that way for use with other programs which use the account codes in this format. I was just working on an R script to reconcile account charges on Friday and swore that as.is = T
was working for me. Now, it doesn't seem to be. Here's some example data:
test <- data.frame(col1 = c("apple", "banana", "carrot"),
col2 = c(100, 200, 300),
col3 = c("00234", "00345", "00456"))
My write.table
strategy:
write.table(test, file = "C:/path/test.csv", quote = T,
sep=",", row.names = F)
Remove the old data.frame
and re-read:
rm(test)
test <- read.csv("C:/path/test.csv")
test
col1 col2 col3
1 apple 100 234
2 banana 200 345
3 carrot 300 456
In case it's not clear, it should look like the original data.frame
we created:
test
col1 col2 col3
1 apple 100 00234
2 banana 200 00345
3 carrot 300 00456
I also tried the following, after perusing the available read.table
options, with the results all the same as above:
test <- read.csv("C:/path/test.csv", quote = '"')
test <- read.csv("C:/path/test.csv", as.is = T)
test <- read.csv("C:/path/test.csv", as.is = T, quote = '"')
StringsAsFactors
didn't seem to be relevant in this case (and sounds like as.is
will do the same thing.
When I open the file in Emacs, col3
is, indeed, surrounded by quotes, so I'd expect it to be treated like text instead of converted to a number:
Most of the other questions are simply about not treating things like factors, or getting numbers not to be recognized as characters, usually the result of an overlooked character string in that column.
I see I can pursue the colClasses
argument from questions like this one, but I'd prefer not to; my "colClasses" are built into the data :) Quoted = character, not quoted = numeric.
Double-quote escape characters There are 2 accepted ways of escaping double-quotes in a CSV file. One is using a 2 consecutive double-quotes to denote 1 literal double-quote in the data. The alternative is using a backslash and a single double-quote.
For read. csv(), the default quote parameter is quote="\"", which means that only double quotes will be used to delimit strings, not single quotes. Because two of your sample names had apostrophes (single quotes), the read. table() function tried to include everything between those two as a single string.
To add single quotes to strings in an R data frame column, we can use paste0 function. This will cover the strings with single quotes from both the sides but we can add them at the initial or only at the last position.
Sometimes column values in an R data frame have single quote associated with them and to perform the analysis we need to remove that quote. Therefore, to remove single quote from string column, we can use gsub function by defining the single quote and replacing it with blank(not space) as shown in the below examples.
After pinging a couple of friends who are R users, they both suggested using colClasses
. I was relieved to find that I didn't need to specify each class, since my data is ~25 columns. SO confirmed this (once I knew what I was looking for) in another question.
test <- read.csv("C:/path/test.csv", colClasses = c(col3 = "character"))
test
col1 col2 col3
1 apple 100 00234
2 banana 200 00345
3 carrot 300 00456
As it currently stands, the question is a duplicate of the other with respect to the solution. The difference is that I was looking for ways other than colClasses
(since as.is
sounds like such a likely candidate), while that question was about how to use colClasses
.
I'll reiterate that I don't actually like this solution, even thought it's pretty simple. Quotes denote text fields in a .csv, and they don't seem to be respected in this case. The LibreOffice .csv import has a checkbox for "Treat quoted fields as text," which I'd think is analogous to as.is = T
in R. Obviously not! #end_rant
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With