Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to use R to Iterate through Subfolders and bind CSV files of the same ID?

Tags:

r

I am stuck. I need a way to iterate through a bunch of subfolders in a directory, pull out 4 .csv files , bind the contents of those 4 .csv files, then write out the new .csv to a new directory using the name of the initial subfolder as the name of the new .csv.

I know R could do this. But I am stuck at how to iterate across the subfolders and bind the csv files together. My obstacle is that each subfolder contains the same 4 .csv files using the same 8-digit id. For example, subfolder A contains 09061234.csv, 09061345.csv, 09061456.csv, and 09061560.csv. subfolder B contains 9061234.csv, 09061345.csv, 09061456.csv, and 09061560.csv. (...). There are 42 subfolders, and hence 168 csv files with the same names. I want to compact the files down to 42.

I can use list.files to retrieve all the subfolders. But then what?

##Get Files from directory
TF = "H:/working/TC/TMS/Counts/June09" 
##List Sub folders
SF <- list.files(TF)
##List of File names inside folders
FN <- list.files(SF)
#Returns list of 168 filenames

###?????###
#How to iterate through each subfolder, read each 8-digit integer id file, 
#bind them all together into one single csv, 
#Then write to new directory using 
#the name of the subfolder as the name of the new csv?

There is probably a way to do this easily but I am a noob with R. Something involving functions, paste and write.table perhaps? Any hints/help/suggestions is greatly appreciated. Thanks!

like image 422
myClone Avatar asked Mar 02 '13 01:03

myClone


People also ask

How to combine all csv files from the same folder into one data frame automatically with R?

Step 1, we set the directory of small datasets. Step 2, we read each file by looping it with list. files() command. Step 3, we append the data using rbind command.

Can R read multiple CSV files?

Using R Base read. R base function provides read. csv() to import a CSV file into DataFrame. You can also use to this to import multiple CSV files at a time in R.


2 Answers

You can use recursive=T option for list.files,

 lapply(c('1234' ,'1345','1456','1560'),function(x){
     sources.files  <- list.files(path=TF,
                                recursive=T,
                                pattern=paste('*09061*',x,'*.csv',sep='')
                                ,full.names=T)
      ## ou read all files with the id and bind them
      dat <- do.call(rbind,lapply(sources.files,read.csv))
      ### write the file for the 
      write(dat,paste('agg',x,'.csv',sep='')
   }
like image 74
agstudy Avatar answered Sep 21 '22 00:09

agstudy


After some tweaking of agstudy's code, I came up with the solution I was ultimately after. There were a couple of missing pieces that are more due to the nature of my specific problem, so I am leaving agstudy's answer as "accepted".

Turns out a function really wasn't needed. At least not for now. If I need to perform this same task again, I will create a function out of it. For now, I can solve this particular problem without it.

Also, for my instance, I needed a conditional "if" statement to handle any non-csv files that may have lived in the subfolders. By adding an if statement, R throws warnings and skips any files that are not comma-separated.
Code:

##Define directory path##
TF = "H:/working/TC/TMS/Counts/June09" 
##List of subfolder files where file name starts with "0906"##
SF <- list.files(TF,recursive=T, pattern=paste("*09061*",x,'*.csv',sep=""))
##Define the list of files to search for##
x <- (c('1234' ,'1345','1456','1560')
##Create a conditional to skip over the non-csv files in each folder##
if (is.integer(x)){
  sources.files  <- list.files(TF, recursive=T,full.names=T)}

dat <- do.call(rbind,lapply(sources.files,read.csv))
#the warnings thrown are ok--these are generated due to the fact that some of the folders contain .xls files
write.table(dat,file="H:/working/TC/TMS/June09Output/June09Batched.csv",row.names=FALSE,sep=",")
like image 32
myClone Avatar answered Sep 19 '22 00:09

myClone