Efficient use of as.numeric() and factor()

Question

I have several hundred character vectors imported into R from a database - each has length of 6-7 million. They are either numeric or factor data that has character(letters) for labels - with levels to be set,all factor, all have some NAs. As an example

vecA <- c("1",NA, "2",....,NA, "100")
vecB <- c("smith", NA, NA, ... , "jones")

Is there an efficient way to coerce vecA to numeric and vecB to factor. The problem is I don't know where the numeric and factor vectors are in the data and it's tedious to go through them one by one.

Josh O'Brien · Accepted Answer

I'd probably use tryCatch(), attempting first to convert each vector to class "numeric". If as.numeric() throws a warning message (as it will when the input vector contains non-numeric characters), I'd catch the warning and instead convert the vector to class "factor".

vecA <- c("1",NA, "2",NA, "100")
vecB <- c("smith", NA, NA, "jones")

myConverter <- function(X) tryCatch(as.numeric(X), 
                                    warning = function(w) as.factor(X))

myConverter(vecA)
# [1]   1  NA   2  NA 100
myConverter(vecB)
# [1] smith <NA>  <NA>  jones
# Levels: jones smith

Blue Magister · Answer

Perhaps a regular expression? For each vector, match things that look like numbers.

convert.numeric <- function(vec) {
  if( grepl("^[0-9]*(\.[0-9]+)?$",vec)) == !is.na(vec)) ) {
    vec <- as.numeric(vec)
  } else { vec <- as.factor(vec) }
  return(vec)
}

Then wrap your vectors into a list and use lapply:

new.vectors <- lapply(old.vectors,convert.numeric)

Efficient use of as.numeric() and factor()

Tags:

r

numeric

character

Yoda

2 Answers

Josh O'Brien

Blue Magister

Recent Activity

Donate For Us

Efficient use of as.numeric() and factor()

Tags:

r

numeric

character

Yoda

2 Answers

Josh O'Brien

Blue Magister

Related questions

Recent Activity

Donate For Us