I am stuck. I need a way to iterate through a bunch of subfolders in a directory, pull out 4 .csv files , bind the contents of those 4 .csv files, then write out the new .csv to a new directory using the name of the initial subfolder as the name of the new .csv.
I know R could do this. But I am stuck at how to iterate across the subfolders and bind the csv files together. My obstacle is that each subfolder contains the same 4 .csv files using the same 8-digit id. For example, subfolder A contains 09061234.csv, 09061345.csv, 09061456.csv, and 09061560.csv. subfolder B contains 9061234.csv, 09061345.csv, 09061456.csv, and 09061560.csv. (...). There are 42 subfolders, and hence 168 csv files with the same names. I want to compact the files down to 42.
I can use list.files
to retrieve all the subfolders. But then what?
##Get Files from directory
TF = "H:/working/TC/TMS/Counts/June09"
##List Sub folders
SF <- list.files(TF)
##List of File names inside folders
FN <- list.files(SF)
#Returns list of 168 filenames
###?????###
#How to iterate through each subfolder, read each 8-digit integer id file,
#bind them all together into one single csv,
#Then write to new directory using
#the name of the subfolder as the name of the new csv?
There is probably a way to do this easily but I am a noob with R. Something involving functions, paste
and write.table
perhaps? Any hints/help/suggestions is greatly appreciated. Thanks!
Step 1, we set the directory of small datasets. Step 2, we read each file by looping it with list. files() command. Step 3, we append the data using rbind command.
Using R Base read. R base function provides read. csv() to import a CSV file into DataFrame. You can also use to this to import multiple CSV files at a time in R.
You can use recursive=T
option for list.files
,
lapply(c('1234' ,'1345','1456','1560'),function(x){
sources.files <- list.files(path=TF,
recursive=T,
pattern=paste('*09061*',x,'*.csv',sep='')
,full.names=T)
## ou read all files with the id and bind them
dat <- do.call(rbind,lapply(sources.files,read.csv))
### write the file for the
write(dat,paste('agg',x,'.csv',sep='')
}
After some tweaking of agstudy's code, I came up with the solution I was ultimately after. There were a couple of missing pieces that are more due to the nature of my specific problem, so I am leaving agstudy's answer as "accepted".
Turns out a function really wasn't needed. At least not for now. If I need to perform this same task again, I will create a function out of it. For now, I can solve this particular problem without it.
Also, for my instance, I needed a conditional "if" statement to handle any non-csv files that may have lived in the subfolders. By adding an if statement, R throws warnings and skips any files that are not comma-separated.
Code:
##Define directory path##
TF = "H:/working/TC/TMS/Counts/June09"
##List of subfolder files where file name starts with "0906"##
SF <- list.files(TF,recursive=T, pattern=paste("*09061*",x,'*.csv',sep=""))
##Define the list of files to search for##
x <- (c('1234' ,'1345','1456','1560')
##Create a conditional to skip over the non-csv files in each folder##
if (is.integer(x)){
sources.files <- list.files(TF, recursive=T,full.names=T)}
dat <- do.call(rbind,lapply(sources.files,read.csv))
#the warnings thrown are ok--these are generated due to the fact that some of the folders contain .xls files
write.table(dat,file="H:/working/TC/TMS/June09Output/June09Batched.csv",row.names=FALSE,sep=",")
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With