Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

supply a vector to "classes" of dataframe

Tags:

r

You know how you can supply a vector of names to a data frame to change the col or row names of a dataframe. Is there a similar method to supply a vector of names that alters the class of each column in a dataframe? You can do this when you read in a dataframe with read.table using colClasses. What about if the dataframe is created inside R?

DF <- as.data.frame(matrix(rnorm(25), 5, 5))
str(DF)  #all numeric modes

names(DF) <- c("A", "A2", "B", "B2", "Z") #I want something like this for classes
some_classes_function_like_names(DF) <- c(rep("character", 3), rep("factor", 2))

#I can do it like this but this seems inefficient 
DF[, 1:3] <- lapply(DF[, 1:3], as.character)
DF[, 4:5] <- lapply(DF[, 4:5], as.factor)

str(DF)

EDIT: I changed sapply above to lapply as sapply doesn't make sense.

EDIT 2: If there's a way to write a user defined function that would suffice as well

like image 752
Tyler Rinker Avatar asked Feb 09 '12 16:02

Tyler Rinker


2 Answers

Try this:

toCls <- function(x, cls) do.call(paste("as", cls, sep = "."), list(x))
replace(DF,, Map(toCls, DF, cls))

Second example. Also try this example (which allows NA to be used for any column whose class is not to be changed). We load the zoo package since it provides a version of as.Date that has a default origin and we define our own as.POSIXct2 to likewise avoid having to otherwise specify the origin.

library(zoo) # supplies alternate as.Date with a default origin
as.NA <- identity
as.POSIXct2 <- function(x) as.POSIXct(x, origin = "1970-01-01")

cls2 <- c("character", "Date", NA, "factor", "POSIXct2")
replace(DF,, Map(toCls, DF, cls2))

Note that its only when converting numbers to "Date" or "POSIXct" that there are origin considerations and when converting character strings such as "2000-01-01" no origin would need to be specified in any case so for such situations we would not need to load zoo and we would not need our own version of as.POSIXct .

EDIT: Added another example.

like image 134
G. Grothendieck Avatar answered Oct 06 '22 00:10

G. Grothendieck


It seems class(x) <- "factor" doesn't work and neither does as(x, "factor"), so I don't know of a direct way of doing what you want.

...But a slightly more explicit way is:

# Coerces data.frame columns to the specified classes
colClasses <- function(d, colClasses) {
    colClasses <- rep(colClasses, len=length(d))
    d[] <- lapply(seq_along(d), function(i) switch(colClasses[i], 
        numeric=as.numeric(d[[i]]), 
        character=as.character(d[[i]]), 
        Date=as.Date(d[[i]], origin='1970-01-01'), 
        POSIXct=as.POSIXct(d[[i]], origin='1970-01-01'), 
        factor=as.factor(d[[i]]),
        as(d[[i]], colClasses[i]) ))
    d
}

# Example usage
DF <- as.data.frame(matrix(rnorm(25), 5, 5))
DF2 <- colClasses(DF, c(rep("character", 3), rep("factor", 2)))
str(DF2)

DF3 <- colClasses(DF, 'Date')
str(DF3)

A couple of things: you can add more cases as needed. And the first line of the function allows you to call with a single class name. The last "default" case of the switch calls the as function and you mileage might vary.

like image 24
Tommy Avatar answered Oct 06 '22 00:10

Tommy