I want to create a function which loops through a large number of files, calculates the number of complete cases for each file and then appends a new row to an existing data frame with the "ID" number of the file and its corresponding number of complete cases.
Below I have created a code which only returns the last row of the data frame. I belive my function only returns the last row, because R overwrites my data frame in every loop, but I am not sure. I have done a lot of research online how to solve this, but I could not find an easy solution (I am very very new to R).
Below you can see my code and the output I get:
complete <- function(directory = "specdata", id = 1:332) {
files_list <- list.files("specdata", full.names = T) # creates a list of files
dat <- data.frame() # creates an emmpty data frame
for (i in id) {
data <- read.csv(files_list[i]) # reads the file "i" in the id vector
nobs <- sum(complete.cases(data)) # counts the number of complete cases in that file
data_frame <- data.frame("ID" = i, nobs) # here I want to store the number of complete cases in a data frame
output <- rbind(dat, data_frame) # here the data_frame should be added to an existing data frame
}
print(output)
}
When I run complete( , 3:5)
, I get the following result:
ID nobs
1 5 402
Thanks four your help! :)
As Maxim.K said, there are better ways to do this but the actual problem here is that your output
variable gets overwritten at each iteration in the for
loop.
Try :
dat <- rbind(dat, data_frame)
and print dat
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With