Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Iterating a function through different columns of a data.frame matching a pattern in the column names

Tags:

for-loop

r

lapply

I want to iterate a function through different columns (with a common pattern in the column names) of a data.frame. for subsetting the data.frame I use this code that works:

df[,grep("abc", colnames(df))]

but I don't know how to apply my function f(x) to all the columns that match this pattern, either using a for loop or lapply function.

the function I'm using is:

compress= function(x) {
  aggregate(df[,x,drop=FALSE],
        list(hour = with(df,paste(dates(Time),
                                         sprintf("%d:00:00",hours(Time))))),
        sum,na.rm=TRUE)
}

where df (the data frame) and Time could be set as variables themselves but for the moment I don't need to do it.

Thanks Giulia

like image 601
Giulia Avatar asked Aug 15 '13 12:08

Giulia


People also ask

How to iterate over all the columns of a Dataframe?

Method #1: Using DataFrame.iteritems (): Dataframe class provides a member function iteritems () which gives an iterator that can be utilized to iterate over all the columns of a data frame. For every column in the Dataframe it returns an iterator to the tuple containing the column name and its contents as series. Code :

How do I loop through the column names of a Dataframe?

Often you may want to loop through the column names of a data frame in R and perform some operation on each column. There are two common ways to do this: Method 1: Use a For Loop. for (i in colnames(df)){ some operation} Method 2: Use sapply() sapply(df, some operation) This tutorial shows an example of how to use each of these methods in practice.

How to iterate over rows using iteritems () function in Python?

Iteration over rows using iteritems () In order to iterate over rows, we use iteritems () function this function iterates over each column as key, value pair with label as key and column value as a Series object. Code #1: import pandas as pd

What is a Dataframe in Python?

The DataFrame is a two-dimensional size-mutable, potentially composite tabular data structure with labeled axes (rows and columns). In this example, we will see different ways to iterate over all or specific columns of a Dataframe.


1 Answers

You've basically got it. Just use apply on the columns of your subsetted data to apply function f over columns (the 2 in the second argument of apply indicates columns, as opposed to 1 which indicates to apply over rows):

apply( df[,grep("abc", colnames(df))] , 2 , f )

Or if you don't want to coerce your df to a matrix (which will happen with apply) you can use lapply as you suggest in much the same manner...

lapply( df[,grep("abc", colnames(df))] , f )

The return value from lapply will be a list, with one element for each column. You can turn this back into a data.frame by wrapping the lapply call with a data.frame, e.g. data.frame( lapply(...) )

Example

# This function just multiplies its argument by 2
f <- function(x) x * 2

df <- data.frame( AB = runif(5) , AC = runif(5) , BB = runif(5) )


apply( df[,grep("A", colnames(df))] , 2 , f )
#            AB        AC
#[1,] 0.4130628 1.3302304
#[2,] 0.2550633 0.1896813
#[3,] 1.5066157 0.7679393
#[4,] 1.7900907 0.5487673
#[5,] 0.7489256 1.6292801


data.frame( lapply( df[,grep("A", colnames(df))] , f ) )
#         AB        AC
#1 0.4130628 1.3302304
#2 0.2550633 0.1896813
#3 1.5066157 0.7679393
#4 1.7900907 0.5487673
#5 0.7489256 1.6292801

# Note the important difference between the two methods...
class( data.frame( lapply( df[,grep("A", colnames(df))] , f ) ) )
#[1] "data.frame"
class( apply( df[,grep("A", colnames(df))] , 2 , f ) )
#[1] "matrix"

Second edit

For the example function you want to run, it might be easier to rewrite it as a function that takes the df as input and a vector of column names that you want to operate on. In this example the function returns a list, with each element of that list containing an aggregated data.frame:

compress= function( df , x ) {
  lapply( x , function(x){
  aggregate(df[,x,drop=FALSE],
        list(hour = with(df,paste(dates(Time),
                                         sprintf("%d:00:00",hours(Time))))),
        sum,na.rm=TRUE)
    }
  )
}

To run the function you then just call it, passing it the data.frame and a vector of colnames...

compress( df , names(df)[ grep("abc", names(df) ) ] ) 
like image 192
Simon O'Hanlon Avatar answered Oct 31 '22 09:10

Simon O'Hanlon