Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Efficient use of as.numeric() and factor()

I have several hundred character vectors imported into R from a database - each has length of 6-7 million. They are either numeric or factor data that has character(letters) for labels - with levels to be set,all factor, all have some NAs. As an example

vecA <- c("1",NA, "2",....,NA, "100")
vecB <- c("smith", NA, NA, ... , "jones")

Is there an efficient way to coerce vecA to numeric and vecB to factor. The problem is I don't know where the numeric and factor vectors are in the data and it's tedious to go through them one by one.

like image 693
Yoda Avatar asked Aug 24 '12 18:08

Yoda


2 Answers

I'd probably use tryCatch(), attempting first to convert each vector to class "numeric". If as.numeric() throws a warning message (as it will when the input vector contains non-numeric characters), I'd catch the warning and instead convert the vector to class "factor".

vecA <- c("1",NA, "2",NA, "100")
vecB <- c("smith", NA, NA, "jones")

myConverter <- function(X) tryCatch(as.numeric(X), 
                                    warning = function(w) as.factor(X))

myConverter(vecA)
# [1]   1  NA   2  NA 100
myConverter(vecB)
# [1] smith <NA>  <NA>  jones
# Levels: jones smith
like image 166
Josh O'Brien Avatar answered Sep 30 '22 16:09

Josh O'Brien


Perhaps a regular expression? For each vector, match things that look like numbers.

convert.numeric <- function(vec) {
  if( grepl("^[0-9]*(\\.[0-9]+)?$",vec)) == !is.na(vec)) ) {
    vec <- as.numeric(vec)
  } else { vec <- as.factor(vec) }
  return(vec)
}

Then wrap your vectors into a list and use lapply:

new.vectors <- lapply(old.vectors,convert.numeric)
like image 25
Blue Magister Avatar answered Sep 30 '22 17:09

Blue Magister