Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Elegant way to get the colclasses of a data.frame

Tags:

dataframe

r

I currently use the following function to list the classes of a data.frame:

sapply(names(iris),function(x) class(iris[,x]))

There must be a more elegant way to do this...

like image 690
Zach Avatar asked Nov 14 '11 22:11

Zach


2 Answers

Since data.frames are already lists, sapply(iris, class) will just work. sapply won't be able to simplify to a vector for classes that extend other classes, so you could do something to take the first class, paste the classes together, etc.

like image 52
Joshua Ulrich Avatar answered Nov 04 '22 09:11

Joshua Ulrich


EDIT If you just want to LOOK at the classes, consider using str:

str(iris) # Show "summary" of data.frame or any other object
#'data.frame':   150 obs. of  5 variables:
# $ Sepal.Length: num  5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
# $ Sepal.Width : num  3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
# $ Petal.Length: num  1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
# $ Petal.Width : num  0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
# $ Species     : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...

But to expand on @JoshuaUlrish excellent answer, a data.frame with time or ordered factor columns would cause pain with the sapply solution:

d <- data.frame(ID=1, time=Sys.time(), factor=ordered(42))

# This doesn't return a character vector anymore
sapply(d, class)
#$ID
#[1] "numeric"
#
#$time
#[1] "POSIXct" "POSIXt" 
#
#$factor
#[1] "ordered" "factor" 

# Alternative 1: Get the first class
sapply(d, function(x) class(x)[[1]])
#       ID      time    factor 
#"numeric" "POSIXct" "ordered"

# Alternative 2: Paste classes together
sapply(d, function(x) paste(class(x), collapse='/'))
#          ID             time           factor 
#   "numeric" "POSIXct/POSIXt" "ordered/factor"     

Note that none of these solutions are perfect. Getting only the first (or last) class can return something quite meaningless. Pasting makes using the compound class harder. Sometimes you might just want to detect when this happens, so an error would be preferable (and I love vapply ;-):

# Alternative 3: Fail if there are multiple-class columns
vapply(d, class, character(1))
#Error in vapply(d, class, character(1)) : values must be length 1,
# but FUN(X[[2]]) result is length 2
like image 39
Tommy Avatar answered Nov 04 '22 08:11

Tommy