Iterating a function through different columns of a data.frame matching a pattern in the column names

Tags:

I want to iterate a function through different columns (with a common pattern in the column names) of a data.frame. for subsetting the data.frame I use this code that works:

df[,grep("abc", colnames(df))]

but I don't know how to apply my function f(x) to all the columns that match this pattern, either using a for loop or lapply function.

the function I'm using is:

compress= function(x) {
  aggregate(df[,x,drop=FALSE],
        list(hour = with(df,paste(dates(Time),
                                         sprintf("%d:00:00",hours(Time))))),
        sum,na.rm=TRUE)
}

where df (the data frame) and Time could be set as variables themselves but for the moment I don't need to do it.

Thanks Giulia

601

asked Aug 15 '13 12:08

Giulia

1 Answers

You've basically got it. Just use apply on the columns of your subsetted data to apply function f over columns (the 2 in the second argument of apply indicates columns, as opposed to 1 which indicates to apply over rows):

apply( df[,grep("abc", colnames(df))] , 2 , f )

Or if you don't want to coerce your df to a matrix (which will happen with apply) you can use lapply as you suggest in much the same manner...

lapply( df[,grep("abc", colnames(df))] , f )

The return value from lapply will be a list, with one element for each column. You can turn this back into a data.frame by wrapping the lapply call with a data.frame, e.g. data.frame( lapply(...) )

Example

# This function just multiplies its argument by 2
f <- function(x) x * 2

df <- data.frame( AB = runif(5) , AC = runif(5) , BB = runif(5) )


apply( df[,grep("A", colnames(df))] , 2 , f )
#            AB        AC
#[1,] 0.4130628 1.3302304
#[2,] 0.2550633 0.1896813
#[3,] 1.5066157 0.7679393
#[4,] 1.7900907 0.5487673
#[5,] 0.7489256 1.6292801


data.frame( lapply( df[,grep("A", colnames(df))] , f ) )
#         AB        AC
#1 0.4130628 1.3302304
#2 0.2550633 0.1896813
#3 1.5066157 0.7679393
#4 1.7900907 0.5487673
#5 0.7489256 1.6292801

# Note the important difference between the two methods...
class( data.frame( lapply( df[,grep("A", colnames(df))] , f ) ) )
#[1] "data.frame"
class( apply( df[,grep("A", colnames(df))] , 2 , f ) )
#[1] "matrix"

Second edit

For the example function you want to run, it might be easier to rewrite it as a function that takes the df as input and a vector of column names that you want to operate on. In this example the function returns a list, with each element of that list containing an aggregated data.frame:

compress= function( df , x ) {
  lapply( x , function(x){
  aggregate(df[,x,drop=FALSE],
        list(hour = with(df,paste(dates(Time),
                                         sprintf("%d:00:00",hours(Time))))),
        sum,na.rm=TRUE)
    }
  )
}

To run the function you then just call it, passing it the data.frame and a vector of colnames...

compress( df , names(df)[ grep("abc", names(df) ) ] )

192

answered Oct 31 '22 09:10

Simon O'Hanlon

Related questions
                            
                                R with xts subsetting: start date plus setting range
                            
                                Efficient implementation of summed area table/integral image in R
                            
                                R and Metatrader 4
                            
                                splitting (1:n)[boolean] into contiguous sequences
                            
                                Finding vector elements with a length longer than 1 in R
                            
                                accessing Y columns with duplicated names in j of X[Y, j] merges
                            
                                Inhibit focus stealing when launching a new graphics plot in r
                            
                                Remove all dots but the first in a character string
                            
                                String matching on two columns in [R]
                            
                                R Shiny switch tabPanel when selectInput value changes
                            
                                Suppress leaf labels in dendrograms
                            
                                data.table time subset vs xts time subset
                            
                                Weighted Least Square
                            
                                Change plot panel in multipanel plot in R [duplicate]
                            
                                R setting xlim in xts plot
                            
                                how to build a loess model in in R using time series data
                            
                                How to set maximum length of triangle side in Delaunay triangulation?
                            
                                How to obtain objects in the environment of the calling function in R?
                            
                                Sys.glob expansion
                            
                                Set font color of code output of knitr to black only

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Iterating a function through different columns of a data.frame matching a pattern in the column names

Tags:

for-loop

r

lapply