Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Apply function to each column in a data frame observing each columns existing data type

Tags:

r

sapply

apply

I'm trying to get the min/max for each column in a large data frame, as part of getting to know my data. My first try was:

apply(t,2,max,na.rm=1) 

It treats everything as a character vector, because the first few columns are character types. So max of some of the numeric columns is coming out as " -99.5".

I then tried this:

sapply(t,max,na.rm=1) 

but it complains about max not meaningful for factors. (lapply is the same.) What is confusing me is that apply thought max was perfectly meaningful for factors, e.g. it returned "ZEBRA" for column 1.

BTW, I took a look at Using sapply on vector of POSIXct and one of the answers says "When you use sapply, your objects are coerced to numeric,...". Is this what is happening to me? If so, is there an alternative apply function that does not coerce? Surely it is a common need, as one of the key features of the data frame type is that each column can be a different type.

like image 517
Darren Cook Avatar asked Sep 05 '11 02:09

Darren Cook


People also ask

How do you apply a function to each column in a Dataframe in R?

Apply any function to all R data frame You can set the MARGIN argument to c(1, 2) or, equivalently, to 1:2 to apply the function to each value of the data frame. If you set MARGIN = c(2, 1) instead of c(1, 2) the output will be the same matrix but transposed. The output is of class “matrix” instead of “data.

How do I apply a function to each column in pandas?

Python's Pandas Library provides an member function in Dataframe class to apply a function along the axis of the Dataframe i.e. along each row or column i.e. Important Arguments are: func : Function to be applied to each column or row. This function accepts a series and returns a series.

What is apply () in R?

The apply() collection is a part of R essential package. This family of functions helps us to apply a certain function to a certain data frame, list, or vector and return the result as a list or vector depending on the function we use.


1 Answers

If it were an "ordered factor" things would be different. Which is not to say I like "ordered factors", I don't, only to say that some relationships are defined for 'ordered factors' that are not defined for "factors". Factors are thought of as ordinary categorical variables. You are seeing the natural sort order of factors which is alphabetical lexical order for your locale. If you want to get an automatic coercion to "numeric" for every column, ... dates and factors and all, then try:

sapply(df, function(x) max(as.numeric(x)) )   # not generally a useful result 

Or if you want to test for factors first and return as you expect then:

sapply( df, function(x) if("factor" %in% class(x) ) {              max(as.numeric(as.character(x)))             } else { max(x) } ) 

@Darrens comment does work better:

 sapply(df, function(x) max(as.character(x)) )   

max does succeed with character vectors.

like image 163
IRTFM Avatar answered Oct 18 '22 04:10

IRTFM