Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Merge several data.frames into one data.frame with a loop

Tags:

loops

for-loop

r

I am trying to merge several data.frames into one data.frame. Since I have a whole list of files I am trying to do it with a loop structure.

So far the loop approach works fine. However, it looks pretty inefficient and I am wondering if there is a faster and easier approach.

Here is the scenario: I have a directory with several .csv files. Each file contains the same identifier which can be used as the merger variable. Since the files are rather large in size I thought to read each file one at a time into R instead of reading all files at once. So I get all the files of the directory with list.files and read in the first two files. Afterwards I use merge to get one data.frame.

FileNames <- list.files(path=".../tempDataFolder/") FirstFile <- read.csv(file=paste(".../tempDataFolder/", FileNames[1], sep=""),              header=T, na.strings="NULL") SecondFile <- read.csv(file=paste(".../tempDataFolder/", FileNames[2], sep=""),               header=T, na.strings="NULL") dataMerge <- merge(FirstFile, SecondFile, by=c("COUNTRYNAME", "COUNTRYCODE", "Year"),              all=T) 

Now I use a for loop to get all the remaining .csv files and merge them into the already existing data.frame:

for(i in 3:length(FileNames)){  ReadInMerge <- read.csv(file=paste(".../tempDataFolder/", FileNames[i], sep=""),                header=T, na.strings="NULL") dataMerge <- merge(dataMerge, ReadInMerge, by=c("COUNTRYNAME", "COUNTRYCODE", "Year"),              all=T) } 

Even though it works just fine I was wondering if there is a more elegant way to get the job done?

like image 736
mropa Avatar asked Feb 05 '10 18:02

mropa


People also ask

How do I combine multiple data frames into one?

The concat() function can be used to concatenate two Dataframes by adding the rows of one to the other. The merge() function is equivalent to the SQL JOIN clause. 'left', 'right' and 'inner' joins are all possible.

What is the command for combining data frames?

The Pandas merge() command takes the left and right dataframes, matches rows based on the “on” columns, and performs different types of merges – left, right, etc.


2 Answers

You may want to look at the closely related question on stackoverflow.

I would approach this in two steps: import all the data (with plyr), then merge it together:

filenames <- list.files(path=".../tempDataFolder/", full.names=TRUE) library(plyr) import.list <- llply(filenames, read.csv) 

That will give you a list of all the files that you now need to merge together. There are many ways to do this, but here's one approach (with Reduce):

data <- Reduce(function(x, y) merge(x, y, all=T,      by=c("COUNTRYNAME", "COUNTRYCODE", "Year")), import.list, accumulate=F) 

Alternatively, you can do this with the reshape package if you aren't comfortable with Reduce:

library(reshape) data <- merge_recurse(import.list) 
like image 173
Shane Avatar answered Sep 23 '22 15:09

Shane


If I'm not mistaken, a pretty simple change could eliminate the 3:length(FileNames) kludge:

FileNames <- list.files(path=".../tempDataFolder/", full.names=TRUE) dataMerge <- data.frame() for(f in FileNames){    ReadInMerge <- read.csv(file=f, header=T, na.strings="NULL")   dataMerge <- merge(dataMerge, ReadInMerge,                 by=c("COUNTRYNAME", "COUNTRYCODE", "Year"), all=T) } 
like image 41
Ken Williams Avatar answered Sep 21 '22 15:09

Ken Williams