I want to create a function which loops through a large number of files, calculates the number of complete cases for each file and then appends a new row to an existing data frame with the "ID" number of the file and its corresponding number of complete cases.
Below I have created a code which only returns the last row of the data frame. I belive my function only returns the last row, because R overwrites my data frame in every loop, but I am not sure. I have done a lot of research online how to solve this, but I could not find an easy solution (I am very very new to R).
Below you can see my code and the output I get:
complete <- function(directory = "specdata", id = 1:332) {
  files_list <- list.files("specdata", full.names = T) # creates a list of files
  dat <- data.frame() # creates an emmpty data frame
    for (i in id) {
    data <- read.csv(files_list[i]) # reads the file "i" in the id vector 
    nobs <- sum(complete.cases(data)) # counts the number of complete cases in that file  
    data_frame <- data.frame("ID" = i, nobs) # here I want to store the number of complete cases in a data frame
    output <- rbind(dat, data_frame) # here the data_frame should be added to an existing data frame
  }
  print(output)
}
When I run complete( , 3:5), I get the following result: 
  ID nobs
1  5  402
Thanks four your help! :)
As Maxim.K said, there are better ways to do this but the actual problem here is that your output variable gets overwritten at each iteration in the for loop. 
Try :
dat <- rbind(dat, data_frame)
and print dat.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With