I have several hundred character vectors imported into R from a database - each has length of 6-7 million. They are either numeric or factor data that has character(letters) for labels - with levels to be set,all factor, all have some NAs. As an example
vecA <- c("1",NA, "2",....,NA, "100")
vecB <- c("smith", NA, NA, ... , "jones")
Is there an efficient way to coerce vecA to numeric and vecB to factor. The problem is I don't know where the numeric and factor vectors are in the data and it's tedious to go through them one by one.
I'd probably use tryCatch(), attempting first to convert each vector to class "numeric". If as.numeric() throws a warning message (as it will when the input vector contains non-numeric characters), I'd catch the warning and instead convert the vector to class "factor".
vecA <- c("1",NA, "2",NA, "100")
vecB <- c("smith", NA, NA, "jones")
myConverter <- function(X) tryCatch(as.numeric(X), 
                                    warning = function(w) as.factor(X))
myConverter(vecA)
# [1]   1  NA   2  NA 100
myConverter(vecB)
# [1] smith <NA>  <NA>  jones
# Levels: jones smith
                        Perhaps a regular expression? For each vector, match things that look like numbers.
convert.numeric <- function(vec) {
  if( grepl("^[0-9]*(\\.[0-9]+)?$",vec)) == !is.na(vec)) ) {
    vec <- as.numeric(vec)
  } else { vec <- as.factor(vec) }
  return(vec)
}
Then wrap your vectors into a list and use lapply:
new.vectors <- lapply(old.vectors,convert.numeric)
                        If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With