Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Subset variables in data frame based on column type

Tags:

r

I need to subset data frame based on column type - for example from data frame with 100 columns I need to keep only those column with type factor or integer. I've written a short function to do this, but is there any simpler solution or some built-in function or package on CRAN?

My current solution to get variable names with requested types:

varlist <- function(df=NULL, vartypes=NULL) {
  type_function <- c("is.factor","is.integer","is.numeric","is.character","is.double","is.logical")
  names(type_function) <- c("factor","integer","numeric","character","double","logical")
  names(df)[as.logical(sapply(lapply(names(df), function(y) sapply(type_function[names(type_function) %in% vartypes], function(x) do.call(x,list(df[[y]])))),sum))]  
}

The function varlist works as follows:

  1. For every requested type and for every column in data frame call "is.TYPE" function
  2. Sum tests for every variable (boolean is casted to integer automatically)
  3. Cast result to logical vector
  4. subset names in data frame

And some data to test it:

df <- read.table(file="http://archive.ics.uci.edu/ml/machine-learning-databases/statlog/german/german.data", sep=" ", header=FALSE, stringsAsFactors=TRUE)
names(df) <- c('ca_status','duration','credit_history','purpose','credit_amount','savings', 'present_employment_since','installment_rate_income','status_sex','other_debtors','present_residence_since','property','age','other_installment','housing','existing_credits', 'job','liable_maintenance_people','telephone','foreign_worker','gb')
df$gb <- ifelse(df$gb == 2, FALSE, TRUE)
df$property <- as.character(df$property)
varlist(df, c("integer","logical"))

I'm asking because my code looks really cryptic and hard to understand (even for me and I've finished the function 10 minutes ago).

like image 863
Tomas Greif Avatar asked Jul 31 '13 07:07

Tomas Greif


2 Answers

Just do the following:

df[,sapply(df,is.factor) | sapply(df,is.integer)]
like image 79
Thomas Avatar answered Oct 26 '22 13:10

Thomas


subset_colclasses <- function(DF, colclasses="numeric") {
  DF[,sapply(DF, function(vec, test) class(vec) %in% test, test=colclasses)]
}

str(subset_colclasses(df, c("factor", "integer")))
like image 33
Roland Avatar answered Oct 26 '22 12:10

Roland