Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Merging two data frames

Tags:

merge

dataframe

r

I have the following data frame:

Date,Year,Austria,Germany,...
1969-12-31,1969,96.743,95.768,...
1970-01-30,1970,95.515,95.091,...
1970-02-27,1970,95.075,95.235,...

Ultimately, I would like to merge this data frame with another one that looks like this:

Year,Country,Exp,...
1969,Austria,1,...
1970,Austria,0,...
1969,Germany,0,...
1970,Germany,1,...

The way I see it, I would have to change the first data frame to the following format:

Date,Year,Country,Exp,…
1969-12-31,1969,Austria,96.743,...
1970-01-30,1970,Austria,95.515,...
1970-02-27,1970,Austria,95.075,...
1969-12-31,1969,Germany,95.768,...
1970-01-30,1970,Germany,95.091,...
1970-02-27,1970,Germany,95.235,...

Then, I can just use the merge function and merge them (one-to-many) using Year and Country.

I have tried to transform the data frame as suggest above. However, the only way I can think of is to use a couple of complicated "for" loops. It would be greatly appreciated if someone had an easier approach. Also, if you think that merging those two data frames can be done in an easier fashion that would be great too.

like image 495
rp1 Avatar asked Oct 17 '12 19:10

rp1


People also ask

How do I merge two data frames?

Another way to combine DataFrames is to use columns in each dataset that contain common values (a common unique id). Combining DataFrames using a common field is called “joining”. The columns containing the common values are called “join key(s)”.

How do I merge two DataFrames in Python?

Pandas DataFrame merge() function is used to merge two DataFrame objects with a database-style join operation. The joining is performed on columns or indexes. If the joining is done on columns, indexes are ignored. This function returns a new DataFrame and the source DataFrame objects are unchanged.


1 Answers

The first data frame you need to melt.

library(reshape)
melt(dat, id.vars="Date,Year") # may need to add ...,c())

Rename the new columns to match your other data.frame.

Then merge (or you might prefer to join, using the plyr package)

merge(dat,dat2, by=c("Date","Country"))

or:

library(plyr)
join(dat,dat2, by=c("Date","Country"))

I prefer the join function, because it acts much more intuitively than merge, especially in the case where there are NA values.

like image 105
Brandon Bertelsen Avatar answered Sep 30 '22 14:09

Brandon Bertelsen