I am new to R and am trying to read a public Google spreadsheet into an R data frame with numeric columns. My problem seems to be that the exported spreadsheet has commas in large numbers, such as "13,061.422". The read.csv() function treats this as a factor. I tried stringsAsFactors=FALSE and colClasses=c(rep("numeric",7)) but neither worked. Is there a way to coerce the values with commas and decimals to numeric values, either within read.csv() or afterwards when they are treated as Factors in the R dataframe? Here is my code:
require(RCurl)
myCsv <- getURL("https://docs.google.com/spreadsheet/pub?hl=en_US&hl=en_US&key=0Agbdciapt4QZdE95UDFoNHlyNnl6aGlqbGF0cDIzTlE&single=true&gid=0&range=A1%3AG4928&output=csv", ssl.verifypeer=FALSE) #ssl.verifypeer=FALSE gets around certificate issues I don't understand.
fullmatrix <- read.csv(textConnection(myCsv))
str(fullmatrix)
which results in:
'data.frame': 4927 obs. of 7 variables:
$ wave. : Factor w/ 4927 levels "1,000.8900","1,002.8190",..: 4875 4874 4873 4872 4871 4870 4869 4868 4867 4866 ...
$ wavelength : Factor w/ 4927 levels "1,000.074","1,000.267",..: 1 2 3 4 5 6 7 8 9 10 ...
$ d2o : num 85.2 87.7 86.3 87.6 85.6 ...
$ di : num 54.3 55.8 54.9 55.6 54.9 ...
$ ddw : num 48.2 49.7 49.4 50.2 49.6 ...
$ ddw.old : num 53.3 55 53.9 54.8 53.7 ...
$ d2o.ddw.mix: num 65.8 67.9 67.2 68.4 66.8 ...
Thanks for any help! I am new to R, so guessing (hoping) this is an easy one!
Yes. Two methods. The easiest to understand at first is probably just to is as.is=TRUE
to preserve them as character vectors and then use gsub
to remove the commas and any currency symbols before converting to numeric. The second is a bit more difficult, but I think more kewl. Create an as-method for the format you are using. Then you can use colClasses
to do it in one step.
I see @EDi already did version #1 (using stringsAsFactors
rather than as.is
, so I will document strategy #2:
library(methods)
setClass("num.with.commas")
#[1] "num.with.commas"
setAs("character", "num.with.commas",
function(from) as.numeric(gsub(",", "", from)))
require(RCurl)
#Loading required package: RCurl
#Loading required package: bitops
myCsv <- getURL("https://docs.google.com/spreadsheet/pub?hl=en_US&hl=en_US&key=0Agbdciapt4QZdE95UDFoNHlyNnl6aGlqbGF0cDIzTlE&single=true&gid=0&range=A1%3AG4928&output=csv", ssl.verifypeer=FALSE)
> fullmatrix <- read.csv(textConnection(myCsv),
colClasses=c(rep("num.with.commas",2), rep("numeric",4) ))
str(fullmatrix)
#--------------
'data.frame': 4927 obs. of 7 variables:
$ wave. : num 9999 9997 9995 9993 9992 ...
$ wavelength : num 1000 1000 1000 1001 1001 ...
$ d2o : num 85.2 87.7 86.3 87.6 85.6 ...
$ di : num 54.3 55.8 54.9 55.6 54.9 ...
$ ddw : num 48.2 49.7 49.4 50.2 49.6 ...
$ ddw.old : num 53.3 55 53.9 54.8 53.7 ...
$ d2o.ddw.mix: num 65.8 67.9 67.2 68.4 66.8 ...
as-methods are coercive. There are many such methods in base R, such as as.list
, as.numeric
, as.character
. In each case they attempt to take input that is in one mode and make a sensible copy of that in a different mode. For instance, it makes sense to coerce a matrix to a dataframe because they both have two dimensions. It makes a bit less sense to coerce a dataframe to a matrix (but it does succeed with loss of all the attributes of the columns and coercion to a common mode.)
In the present case I am taking a character string as input, removing any commas, and coercing the character values to numeric. Then I use read.table
's ( in this case by way of read.csv
) 'colClasses' argument to dispatch to the as-method I registered with setAs
. You may want to go to the help(setAs)
page for more details. The S4 class system confuses a lot of people, me included. This is about the only area of success I have had with S4 methods.
Read the data with stringsAsFactors = FALSE
in, remove the commas (with gsub()
) and convert to numeric (with as.numeric()
):
> fullmatrix <- read.csv(textConnection(myCsv), stringsAsFactors = FALSE)
> str(fullmatrix)
'data.frame': 4927 obs. of 7 variables:
$ wave. : chr "9,999.2590" "9,997.3300" "9,995.4010" "9,993.4730" ...
$ wavelength : chr "1,000.07410549122" "1,000.26707130804" "1,000.46011160533" "1,000.65312629553" ...
$ d2o : num 85.2 87.7 86.3 87.6 85.6 ...
$ di : num 54.3 55.8 54.9 55.6 54.9 ...
$ ddw : num 48.2 49.7 49.4 50.2 49.6 ...
$ ddw.old : num 53.3 55 53.9 54.8 53.7 ...
$ d2o.ddw.mix: num 65.8 67.9 67.2 68.4 66.8 ...
> fullmatrix$wave. <- as.numeric(gsub(",", "", fullmatrix$wave.))
> fullmatrix$wavelength <- as.numeric(gsub(",", "", fullmatrix$wavelength))
> str(fullmatrix)
'data.frame': 4927 obs. of 7 variables:
$ wave. : num 9999 9997 9995 9993 9992 ...
$ wavelength : num 1000 1000 1000 1001 1001 ...
$ d2o : num 85.2 87.7 86.3 87.6 85.6 ...
$ di : num 54.3 55.8 54.9 55.6 54.9 ...
$ ddw : num 48.2 49.7 49.4 50.2 49.6 ...
$ ddw.old : num 53.3 55 53.9 54.8 53.7 ...
$ d2o.ddw.mix: num 65.8 67.9 67.2 68.4 66.8 ...
> fullmatrix[1, 1]
[1] 9999.259
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With