Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Looping linear models for multiple files in directory

I have a folder with 26 .csv files in it. Each file has two columns with the headers DO2 and Time_min and all have at least 300+ rows.

I want make a scatterplot with x=Time_min and y=DO2, make linear model of each, take the coefficient and R^2 for each of the 26 models and put it in a table.

This is what I've written as far as code goes. I know I can just copy and paste it but I also know there has to be a smarter way.

setwd("~/Documents/Masters/Data/R/35789/35789_Ucrit")

#The file where I want all the coefficients and R^2 to go
UE_Slope <- read.csv("~/Documents/Masters/Data/R/35789/35789_UE_Slope.csv")

temp = list.files(pattern="*.csv")
for (i in 1:length(temp))(assign(temp[i], read.csv(temp[i])))

#Seal# are the names for the files directory, 1-26
plot(DO2 ~ Time_min, data = Seal1)
model1 <- lm(DO2 ~ Time_min, data = Seal1.csv)
UE_Slope <- rbind(UE_Slope, data.frame("Slope"=coef(model1)[[2]], "R.2"=summary(model1)$r.squared))
like image 459
Krista Avatar asked Feb 12 '26 03:02

Krista


1 Answers

We first define a function, that reads a "csv" file, fits a linear model and obtains summary statistics.

f <- function (file) {
  ## read file
  dat <- read.csv(file)
  ## fit model
  fit <- lm(DO2 ~ Time_min, data = dat)
  slope <- coef(fit)[2]
  ## make a plot??
  plot(DO2 ~ Time_min, data = dat, main = file)  ## use file names as title
  abline(fit)  ## overlay fitted regression line
  ## note, I am not using `summary.lm` as that is expensive
  ## R-squared can be easily computed
  RSS <- crossprod(fit$residuals)[1]
  TSS <- crossprod(dat$DO2 - mean(dat$DO2))[1]
  R2 <- 1 - RSS / TSS
  ## return a vector
  c("Slope" = slope, "R.2" = R2)
  }

Now, we simply loop through all files, applying f:

temp <- list.files(pattern = "*.csv")
pdf("whatever.pdf")
result <- t(sapply(temp, f))
dev.off()

sapply do cbind which ends up with a flat matrix; use t() to make it a tall matrix. The pdf() and dev,off() opens / closes a PDF file and all plots are made on that file. This looks necessary as you have 26 figures, not easy to display them in a panel fashion on the screen. By using a PDF file, you can have one plot per page. The PDF file will be in your current working directory.

like image 112
Zheyuan Li Avatar answered Feb 15 '26 11:02

Zheyuan Li