I want to iterate a function through different columns (with a common pattern in the column names) of a data.frame. for subsetting the data.frame I use this code that works:
df[,grep("abc", colnames(df))]
but I don't know how to apply my function f(x) to all the columns that match this pattern, either using a for loop or lapply function.
the function I'm using is:
compress= function(x) {
aggregate(df[,x,drop=FALSE],
list(hour = with(df,paste(dates(Time),
sprintf("%d:00:00",hours(Time))))),
sum,na.rm=TRUE)
}
where df (the data frame) and Time could be set as variables themselves but for the moment I don't need to do it.
Thanks Giulia
Method #1: Using DataFrame.iteritems (): Dataframe class provides a member function iteritems () which gives an iterator that can be utilized to iterate over all the columns of a data frame. For every column in the Dataframe it returns an iterator to the tuple containing the column name and its contents as series. Code :
Often you may want to loop through the column names of a data frame in R and perform some operation on each column. There are two common ways to do this: Method 1: Use a For Loop. for (i in colnames(df)){ some operation} Method 2: Use sapply() sapply(df, some operation) This tutorial shows an example of how to use each of these methods in practice.
Iteration over rows using iteritems () In order to iterate over rows, we use iteritems () function this function iterates over each column as key, value pair with label as key and column value as a Series object. Code #1: import pandas as pd
The DataFrame is a two-dimensional size-mutable, potentially composite tabular data structure with labeled axes (rows and columns). In this example, we will see different ways to iterate over all or specific columns of a Dataframe.
You've basically got it. Just use apply
on the columns of your subsetted data to apply
function f
over columns (the 2
in the second argument of apply
indicates columns, as opposed to 1
which indicates to apply
over rows):
apply( df[,grep("abc", colnames(df))] , 2 , f )
Or if you don't want to coerce your df
to a matrix
(which will happen with apply
) you can use lapply
as you suggest in much the same manner...
lapply( df[,grep("abc", colnames(df))] , f )
The return value from lapply
will be a list, with one element for each column. You can turn this back into a data.frame
by wrapping the lapply
call with a data.frame
, e.g. data.frame( lapply(...) )
# This function just multiplies its argument by 2
f <- function(x) x * 2
df <- data.frame( AB = runif(5) , AC = runif(5) , BB = runif(5) )
apply( df[,grep("A", colnames(df))] , 2 , f )
# AB AC
#[1,] 0.4130628 1.3302304
#[2,] 0.2550633 0.1896813
#[3,] 1.5066157 0.7679393
#[4,] 1.7900907 0.5487673
#[5,] 0.7489256 1.6292801
data.frame( lapply( df[,grep("A", colnames(df))] , f ) )
# AB AC
#1 0.4130628 1.3302304
#2 0.2550633 0.1896813
#3 1.5066157 0.7679393
#4 1.7900907 0.5487673
#5 0.7489256 1.6292801
# Note the important difference between the two methods...
class( data.frame( lapply( df[,grep("A", colnames(df))] , f ) ) )
#[1] "data.frame"
class( apply( df[,grep("A", colnames(df))] , 2 , f ) )
#[1] "matrix"
For the example function you want to run, it might be easier to rewrite it as a function that takes the df
as input and a vector of column names that you want to operate on. In this example the function returns a list, with each element of that list containing an aggregated data.frame
:
compress= function( df , x ) {
lapply( x , function(x){
aggregate(df[,x,drop=FALSE],
list(hour = with(df,paste(dates(Time),
sprintf("%d:00:00",hours(Time))))),
sum,na.rm=TRUE)
}
)
}
To run the function you then just call it, passing it the data.frame and a vector of colnames...
compress( df , names(df)[ grep("abc", names(df) ) ] )
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With