I have several hundred character vectors imported into R from a database - each has length of 6-7 million. They are either numeric or factor data that has character(letters) for labels - with levels to be set,all factor, all have some NAs. As an example
vecA <- c("1",NA, "2",....,NA, "100")
vecB <- c("smith", NA, NA, ... , "jones")
Is there an efficient way to coerce vecA to numeric and vecB to factor. The problem is I don't know where the numeric and factor vectors are in the data and it's tedious to go through them one by one.
I'd probably use tryCatch()
, attempting first to convert each vector to class "numeric"
. If as.numeric()
throws a warning message (as it will when the input vector contains non-numeric characters), I'd catch the warning and instead convert the vector to class "factor"
.
vecA <- c("1",NA, "2",NA, "100")
vecB <- c("smith", NA, NA, "jones")
myConverter <- function(X) tryCatch(as.numeric(X),
warning = function(w) as.factor(X))
myConverter(vecA)
# [1] 1 NA 2 NA 100
myConverter(vecB)
# [1] smith <NA> <NA> jones
# Levels: jones smith
Perhaps a regular expression? For each vector, match things that look like numbers.
convert.numeric <- function(vec) {
if( grepl("^[0-9]*(\\.[0-9]+)?$",vec)) == !is.na(vec)) ) {
vec <- as.numeric(vec)
} else { vec <- as.factor(vec) }
return(vec)
}
Then wrap your vectors into a list and use lapply
:
new.vectors <- lapply(old.vectors,convert.numeric)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With