I am trying to <code>merge</code> several <code>data.frames</code> into one <code>data.frame</code>. Since I have a whole list of files I am trying to do it with a loop structure. So far the loop approach works fine. However, it looks pretty inefficient and I am wondering if there is a faster and easier approach. Here is the scenario: I have a directory with several <code>.csv</code> files. Each file contains the same identifier which can be used as the merger variable. Since the files are rather large in size I thought to read each file one at a time into R instead of reading all files at once. So I get all the files of the directory with <code>list.files</code> and read in the first two files. Afterwards I use <code>merge</code> to get one <code>data.frame</code>. <pre class="prettyprint"><code>FileNames <- list.files(path=".../tempDataFolder/") FirstFile <- read.csv(file=paste(".../tempDataFolder/", FileNames[1], sep=""), header=T, na.strings="NULL") SecondFile <- read.csv(file=paste(".../tempDataFolder/", FileNames[2], sep=""), header=T, na.strings="NULL") dataMerge <- merge(FirstFile, SecondFile, by=c("COUNTRYNAME", "COUNTRYCODE", "Year"), all=T) </code></pre> Now I use a <code>for</code> loop to get all the remaining <code>.csv</code> files and <code>merge</code> them into the already existing <code>data.frame</code>: <pre class="prettyprint"><code>for(i in 3:length(FileNames)){ ReadInMerge <- read.csv(file=paste(".../tempDataFolder/", FileNames[i], sep=""), header=T, na.strings="NULL") dataMerge <- merge(dataMerge, ReadInMerge, by=c("COUNTRYNAME", "COUNTRYCODE", "Year"), all=T) } </code></pre> Even though it works just fine I was wondering if there is a more elegant way to get the job done?

If I'm not mistaken, a pretty simple change could eliminate the <code>3:length(FileNames)</code> kludge: <pre class="prettyprint"><code>FileNames <- list.files(path=".../tempDataFolder/", full.names=TRUE) dataMerge <- data.frame() for(f in FileNames){ ReadInMerge <- read.csv(file=f, header=T, na.strings="NULL") dataMerge <- merge(dataMerge, ReadInMerge, by=c("COUNTRYNAME", "COUNTRYCODE", "Year"), all=T) } </code></pre>

Merge several data.frames into one data.frame with a loop

Tags:

loops

for-loop

r

I am trying to merge several data.frames into one data.frame. Since I have a whole list of files I am trying to do it with a loop structure.

So far the loop approach works fine. However, it looks pretty inefficient and I am wondering if there is a faster and easier approach.

Here is the scenario: I have a directory with several .csv files. Each file contains the same identifier which can be used as the merger variable. Since the files are rather large in size I thought to read each file one at a time into R instead of reading all files at once. So I get all the files of the directory with list.files and read in the first two files. Afterwards I use merge to get one data.frame.

FileNames <- list.files(path=".../tempDataFolder/") FirstFile <- read.csv(file=paste(".../tempDataFolder/", FileNames[1], sep=""),              header=T, na.strings="NULL") SecondFile <- read.csv(file=paste(".../tempDataFolder/", FileNames[2], sep=""),               header=T, na.strings="NULL") dataMerge <- merge(FirstFile, SecondFile, by=c("COUNTRYNAME", "COUNTRYCODE", "Year"),              all=T)

Now I use a for loop to get all the remaining .csv files and merge them into the already existing data.frame:

for(i in 3:length(FileNames)){  ReadInMerge <- read.csv(file=paste(".../tempDataFolder/", FileNames[i], sep=""),                header=T, na.strings="NULL") dataMerge <- merge(dataMerge, ReadInMerge, by=c("COUNTRYNAME", "COUNTRYCODE", "Year"),              all=T) }

Even though it works just fine I was wondering if there is a more elegant way to get the job done?

736

asked Feb 05 '10 18:02

mropa

2 Answers

You may want to look at the closely related question on stackoverflow.

I would approach this in two steps: import all the data (with plyr), then merge it together:

filenames <- list.files(path=".../tempDataFolder/", full.names=TRUE) library(plyr) import.list <- llply(filenames, read.csv)

That will give you a list of all the files that you now need to merge together. There are many ways to do this, but here's one approach (with Reduce):

data <- Reduce(function(x, y) merge(x, y, all=T,      by=c("COUNTRYNAME", "COUNTRYCODE", "Year")), import.list, accumulate=F)

Alternatively, you can do this with the reshape package if you aren't comfortable with Reduce:

library(reshape) data <- merge_recurse(import.list)

173

answered Sep 23 '22 15:09

Shane

If I'm not mistaken, a pretty simple change could eliminate the 3:length(FileNames) kludge:

FileNames <- list.files(path=".../tempDataFolder/", full.names=TRUE) dataMerge <- data.frame() for(f in FileNames){    ReadInMerge <- read.csv(file=f, header=T, na.strings="NULL")   dataMerge <- merge(dataMerge, ReadInMerge,                 by=c("COUNTRYNAME", "COUNTRYCODE", "Year"), all=T) }

answered Sep 21 '22 15:09

Ken Williams

Related questions
                            
                                R on Windows: character encoding hell
                            
                                When to use R, when to use SQL?
                            
                                R CMD check --as-cran warning
                            
                                In read.table(): incomplete final line found by readTableHeader
                            
                                Building an R package on Travis, how not to treat warnings as errors?
                            
                                doMC vs doSNOW vs doSMP vs doMPI: why aren't the various parallel backends for 'foreach' functionally equivalent?
                            
                                Check if variable has the value ''
                            
                                Marker mouse click event in R leaflet for shiny
                            
                                Get the size of the window in Shiny
                            
                                How to refer to a variable name with spaces?
                            
                                Plotting of very large data sets in R
                            
                                How can I set axis ranges in ggplot2 when using a log scale?
                            
                                Subset data.table by logical column
                            
                                How to make an empty vector of POSIXct
                            
                                Mermaid diagram line break
                            
                                Understanding the differences between mclapply and parLapply in R
                            
                                Fetching UTF-8 text from MySQL in R returns "????"
                            
                                Fastest & most flexible way to chart over 2 million rows of flat file data?
                            
                                How to test if object is a vector
                            
                                How to set the tolerance of expect_equal in testthat framework

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With