Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R: respect quotes around numbers (treat as character) with read.csv()?

Tags:

r

formatting

csv

I have a .csv file with account codes in the form of 00xxxxx and I need them to stay that way for use with other programs which use the account codes in this format. I was just working on an R script to reconcile account charges on Friday and swore that as.is = T was working for me. Now, it doesn't seem to be. Here's some example data:

test <- data.frame(col1 = c("apple", "banana", "carrot"),
                   col2 = c(100, 200, 300),
                   col3 = c("00234", "00345", "00456"))

My write.table strategy:

write.table(test, file = "C:/path/test.csv", quote = T,
            sep=",", row.names = F)

Remove the old data.frame and re-read:

rm(test)
test <- read.csv("C:/path/test.csv")
test

    col1 col2 col3
1  apple  100  234
2 banana  200  345
3 carrot  300  456

In case it's not clear, it should look like the original data.frame we created:

test
    col1 col2  col3
1  apple  100 00234
2 banana  200 00345
3 carrot  300 00456

I also tried the following, after perusing the available read.table options, with the results all the same as above:

test <- read.csv("C:/path/test.csv", quote = '"')
test <- read.csv("C:/path/test.csv", as.is = T)
test <- read.csv("C:/path/test.csv", as.is = T, quote = '"')

StringsAsFactors didn't seem to be relevant in this case (and sounds like as.is will do the same thing.

When I open the file in Emacs, col3 is, indeed, surrounded by quotes, so I'd expect it to be treated like text instead of converted to a number:

emacs screenshot

Most of the other questions are simply about not treating things like factors, or getting numbers not to be recognized as characters, usually the result of an overlooked character string in that column.

I see I can pursue the colClasses argument from questions like this one, but I'd prefer not to; my "colClasses" are built into the data :) Quoted = character, not quoted = numeric.

like image 231
Hendy Avatar asked Apr 07 '14 21:04

Hendy


People also ask

How to avoid double quotes in CSV?

Double-quote escape characters There are 2 accepted ways of escaping double-quotes in a CSV file. One is using a 2 consecutive double-quotes to denote 1 literal double-quote in the data. The alternative is using a backslash and a single double-quote.

How do I read a csv file in a quote?

For read. csv(), the default quote parameter is quote="\"", which means that only double quotes will be used to delimit strings, not single quotes. Because two of your sample names had apostrophes (single quotes), the read. table() function tried to include everything between those two as a single string.

How do I put quotes around text in R?

To add single quotes to strings in an R data frame column, we can use paste0 function. This will cover the strings with single quotes from both the sides but we can add them at the initial or only at the last position.

How do I remove quote marks from data in R?

Sometimes column values in an R data frame have single quote associated with them and to perform the analysis we need to remove that quote. Therefore, to remove single quote from string column, we can use gsub function by defining the single quote and replacing it with blank(not space) as shown in the below examples.


1 Answers

After pinging a couple of friends who are R users, they both suggested using colClasses. I was relieved to find that I didn't need to specify each class, since my data is ~25 columns. SO confirmed this (once I knew what I was looking for) in another question.

test <- read.csv("C:/path/test.csv", colClasses = c(col3 = "character"))
test

    col1 col2  col3
1  apple  100 00234
2 banana  200 00345
3 carrot  300 00456

As it currently stands, the question is a duplicate of the other with respect to the solution. The difference is that I was looking for ways other than colClasses (since as.is sounds like such a likely candidate), while that question was about how to use colClasses.

I'll reiterate that I don't actually like this solution, even thought it's pretty simple. Quotes denote text fields in a .csv, and they don't seem to be respected in this case. The LibreOffice .csv import has a checkbox for "Treat quoted fields as text," which I'd think is analogous to as.is = T in R. Obviously not! #end_rant

like image 139
Hendy Avatar answered Oct 21 '22 02:10

Hendy