Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Loop through data frame and variable names

I am looking for a way to automate some diagrams in R using a FOR loop:

dflist <- c("dataframe1", "dataframe2", "dataframe3", "dataframe4")

for (i in dflist) {
  plot(i$var1, i$var2)
}

All dataframes have the same variables, i.e. var1, var2.

It seems for loops are not the most elegant solution here, but I don't understand how to use the apply functions for diagrams.

EDIT:

My original example using mean() didn't help in the original question, so I changed it to a plot function.

like image 734
Timm S. Avatar asked May 23 '13 12:05

Timm S.


3 Answers

To further add to Beasterfield's answer, it seems like you want to do some number of complex operations on each of the data frames.

It is possible to have complex functions within an apply statement. So where you now have:

for (i in dflist) {
  # Do some complex things
}

This can be translated to:

lapply(dflist, function(df) {
  # Do some complex operations on each data frame, df
  # More steps

  # Make sure the last thing is NULL. The last statement within the function will be
  # returned to lapply, which will try to combine these as a list across all data frames.
  # You don't actually care about this, you just want to run the function.
  NULL
})

A more concrete example using plot:

# Assuming we have a data frame with our points on the x, and y axes,
lapply(dflist, function(df) {
  x2 <- df$x^2
  log_y <- log(df$y)
  plot(x,y)
  NULL
})

You can also write complex functions which take multiple arguments:

lapply(dflist, function(df, arg1, arg2) {
  # Do something on each data.frame, df
  # arg1 == 1, arg2 == 2 (see next line)
}, 1, 2) # extra arguments are passed in here

Hope this helps you out!

like image 69
Scott Ritchie Avatar answered Oct 08 '22 05:10

Scott Ritchie


Concerning your actual question you should learn how to access cells, rows and columns of data.frames, matrixs or lists. From your code I guess you want to access the j'th columns of the data.frame i, so it should read:

mean( i[,j] )
# or
mean( i[[ j ]] )

The $ operator can be only used if you want to access a particular variable in your data.frame, e.g. i$var1. Additionally, it is less performant than accessing by [, ] or [[]].

However, although it's not wrong, usage of for loops it is not very R'ish. You should read about vectorized functions and the apply family. So your code could be easily rewritten as:

set.seed(42)
dflist <- vector( "list", 5 )
for( i in 1:5 ){
  dflist[[i]] <- data.frame( A = rnorm(100), B = rnorm(100), C = rnorm(100) )
}
varlist <- c("A", "B")

lapply( dflist, function(x){ colMeans(x[varlist]) } )
like image 42
Beasterfield Avatar answered Oct 08 '22 05:10

Beasterfield


set.seed(42)
dflist <- list(data.frame(x=runif(10),y=rnorm(10)),
               data.frame(x=rnorm(10),y=runif(10)))

par(mfrow=c(1,2))
for (i in dflist) {
  plot(y~x, data=i)
}
like image 44
Roland Avatar answered Oct 08 '22 04:10

Roland