You know how you can supply a vector of names to a data frame to change the col or row names of a dataframe. Is there a similar method to supply a vector of names that alters the class of each column in a dataframe? You can do this when you read in a dataframe with read.table using colClasses. What about if the dataframe is created inside R?
DF <- as.data.frame(matrix(rnorm(25), 5, 5))
str(DF) #all numeric modes
names(DF) <- c("A", "A2", "B", "B2", "Z") #I want something like this for classes
some_classes_function_like_names(DF) <- c(rep("character", 3), rep("factor", 2))
#I can do it like this but this seems inefficient
DF[, 1:3] <- lapply(DF[, 1:3], as.character)
DF[, 4:5] <- lapply(DF[, 4:5], as.factor)
str(DF)
EDIT: I changed sapply above to lapply as sapply doesn't make sense.
EDIT 2: If there's a way to write a user defined function that would suffice as well
Try this:
toCls <- function(x, cls) do.call(paste("as", cls, sep = "."), list(x))
replace(DF,, Map(toCls, DF, cls))
Second example. Also try this example (which allows NA
to be used for any column whose class is not to be changed). We load the zoo package since it provides a version of as.Date
that has a default origin and we define our own as.POSIXct2
to likewise avoid having to otherwise specify the origin.
library(zoo) # supplies alternate as.Date with a default origin
as.NA <- identity
as.POSIXct2 <- function(x) as.POSIXct(x, origin = "1970-01-01")
cls2 <- c("character", "Date", NA, "factor", "POSIXct2")
replace(DF,, Map(toCls, DF, cls2))
Note that its only when converting numbers to "Date"
or "POSIXct"
that there are origin considerations and when converting character strings such as "2000-01-01"
no origin would need to be specified in any case so for such situations we would not need to load zoo and we would not need our own version of as.POSIXct
.
EDIT: Added another example.
It seems class(x) <- "factor"
doesn't work and neither does as(x, "factor")
, so I don't know of a direct way of doing what you want.
...But a slightly more explicit way is:
# Coerces data.frame columns to the specified classes
colClasses <- function(d, colClasses) {
colClasses <- rep(colClasses, len=length(d))
d[] <- lapply(seq_along(d), function(i) switch(colClasses[i],
numeric=as.numeric(d[[i]]),
character=as.character(d[[i]]),
Date=as.Date(d[[i]], origin='1970-01-01'),
POSIXct=as.POSIXct(d[[i]], origin='1970-01-01'),
factor=as.factor(d[[i]]),
as(d[[i]], colClasses[i]) ))
d
}
# Example usage
DF <- as.data.frame(matrix(rnorm(25), 5, 5))
DF2 <- colClasses(DF, c(rep("character", 3), rep("factor", 2)))
str(DF2)
DF3 <- colClasses(DF, 'Date')
str(DF3)
A couple of things: you can add more cases as needed. And the first line of the function allows you to call with a single class name. The last "default" case of the switch
calls the as
function and you mileage might vary.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With