I have a dataframe with about ~250 variables. Unfortunately, all of these variables were imported as character classes from a sql database using sqldf
.
The problem: all of them should not be character classes. There are numeric variables, integers, as well as dates. I'd like to build a model that runs over all the variables and to do this I need to make sure that variables have the right classes. Doing it one by one is probably best, but still very manual.
How could I automatically correct all classes? Perhaps a way to detect whether there are alphabet characters in the column or only number characters?
I don't think it's possible for an automatic approach to be perfect in correcting all classes. But it might correct most of the classes, then those that are not good, I can take care of them manually.
I am adding a sqldf tag in case anybody knows of any way to correct this when importing the data, but I assume it's not sqldf's fault but rather the database's.
The closest thing to "automatic" type conversion on a data frame would probably be
df[] <- lapply(df, type.convert)
where df
is your data set. The function type.convert()
Converts a character vector to logical, integer, numeric, complex or factor as appropriate.
Have a read of help(type.convert)
, it might be just what you want.
In my experience, type.convert()
is very reliable. You can use as.is = TRUE
if you don't want characters coerced to factors. Plus it's used internally in many important R functions (like read.table
), so it's definitely safe.
Here's a quick example of it working on iris
. First we'll change all columns to character, then run type.convert()
on it.
## Original column classes in iris
sapply(iris, class)
# Sepal.Length Sepal.Width Petal.Length Petal.Width Species
# "numeric" "numeric" "numeric" "numeric" "factor"
## Change all columns to character
iris[] <- lapply(iris, as.character)
sapply(iris, class)
# Sepal.Length Sepal.Width Petal.Length Petal.Width Species
# "character" "character" "character" "character" "character"
## Run type.convert()
iris[] <- lapply(iris, type.convert)
sapply(iris, class)
# Sepal.Length Sepal.Width Petal.Length Petal.Width Species
# "numeric" "numeric" "numeric" "numeric" "factor"
We can see that the columns were returned to their original classes. This is because type.convert()
coerces columns to the "most appropriate" type.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With