Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

With R, loop over data frames, and assign appropriate names to objects created in the loop

Tags:

r

This is something which data analysts do all the time (especially when working with survey data which features missing responses.) It's common to first multiply impute a set of compete data matrices, fit models to each of these matrices, and then combine the results. At the moment I'm doing things by hand and looking for a more elegant solution.

Imagine there's 5 *.csv files in the working directory, named dat1.csv, dat2.csv, ... dat5.csv. I want to estimate the same linear model using each data set.

Given this answer, a first step is to gather a list of the files, which I do with the following

csvdat <- list.files(pattern="dat.*csv")

Now I want to do something like

for(x in csvdat) {
    lm.which(csvdat == "x") <- lm(y ~ x1 + x2, data = x)
}

The "which" statement is my silly way of trying to number each model in turn, using the location in the csvdat list the loop is currently up to. that is, I'd like this loop to return a set of 5 lm objects with the names lm.1, lm.2, etc

Is there some simple way to create these objects, and name them so that I can easily indicate which data set they correspond to?

Thanks for your help!

like image 890
tomw Avatar asked May 26 '11 21:05

tomw


People also ask

Why it is advisable to use predefined functions in R instead of loops?

It is better to use one or more function calls within the loop if a loop is getting (too) big. The function calls will make it easier for other users to follow the code.


2 Answers

Another approach is to use the plyr package to do the looping. Using the example constructed by @chl, here is how you would do it

require(plyr)

# read csv files into list of data frames
data_frames = llply(csvdat, read.csv)

# run regression models on each data frame
regressions = llply(data_frames, lm, formula = y ~ .)
names(regressions) = csvdat
like image 98
Ramnath Avatar answered Oct 14 '22 22:10

Ramnath


Use a list to store the results of your regression models as well, e.g.

foo <- function(n) return(transform(X <- as.data.frame(replicate(2, rnorm(n))), 
                                                       y = V1+V2+rnorm(n)))
write.csv(foo(10), file="dat1.csv")
write.csv(foo(10), file="dat2.csv")
csvdat <- list.files(pattern="dat.*csv")
lm.res <- list()
for (i in seq(along=csvdat))
  lm.res[[i]] <- lm(y ~ ., data=read.csv(csvdat[i]))
names(lm.res) <- csvdat
like image 29
chl Avatar answered Oct 14 '22 20:10

chl