Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Merging data by 2 variables in R

Tags:

merge

r

I am attempting to merge two data sets. In the past I have used merge() with by equal to the variable I want to merge by. However, now I would like to do so with two variables. My first data set looks something like this:

Year   Winning_Tm    Losing_Tm
2011   Texas         Washington
2012   Alabama       South Carolina
2013   Tennessee     Texas

Then I have another data set with a rank of each team (this is very simplified) for each year. Like this:

Year    Team             Rank
2011    Texas            32
2011    Washington       34
2012    South Carolina   45
2012    Alabama          12
2013    Texas            6
2013    Tennessee        51

I would like to merge them so I have a data set that looks like this:

Year   Winning_Tm    Winning_TM_rank    Losing_Tm        Losing_Tm_rank
2011   Texas         32                 Washington       34
2012   Alabama       12                 South Carolina   45
2013   Tennessee     51                 Texas            6

My hope is that there is a simple way to do this but it may be more complicated. Thanks!

like image 923
a.powell Avatar asked Aug 25 '16 17:08

a.powell


Video Answer


1 Answers

I reproduced your data (try to include a dput of it next time):

A <- data.frame(
  Year = c(2011, 2012, 2013),
  Winning_Tm = c("Texas","Alabama","Tennessee"),
  Losing_Tm = c("Washington","South Carolina", "Texas"),
  stringsAsFactors = FALSE
)

B <- data.frame(
  Year = c("2011","2011","2012","2012","2013","2013"),
  Team = c("Texas","Washington","South Carolina","Alabama","Texas","Tennessee"),
  Rank = c(32,34,45,12,6,51),
  stringsAsFactors = FALSE
)

You can melt the first dataframe using the reshape2 package:

library(reshape2)

A <- melt(A, id.vars = "Year")
names(A)[3] <- "Team"

Now it looks like this:

> A
  Year   variable           Team
1 2011 Winning_Tm          Texas
2 2012 Winning_Tm        Alabama
3 2013 Winning_Tm      Tennessee
4 2011  Losing_Tm     Washington
5 2012  Losing_Tm South Carolina
6 2013  Losing_Tm          Texas

You can then merge the datasets together by the two columns of interest:

AB <- merge(A, B, by=c("Year","Team"))

Which looks like this:

> AB
  Year           Team   variable Rank
1 2011          Texas Winning_Tm   32
2 2011     Washington  Losing_Tm   34
3 2012        Alabama Winning_Tm   12
4 2012 South Carolina  Losing_Tm   45
5 2013      Tennessee Winning_Tm   51
6 2013          Texas  Losing_Tm    6

Then using the reshape command from base R you can change AB to a wide format:

reshape(AB, idvar = "Year", timevar = "variable", direction = "wide")

The result:

  Year Team.Winning_Tm Rank.Winning_Tm Team.Losing_Tm Rank.Losing_Tm
1 2011           Texas              32     Washington             34
3 2012         Alabama              12 South Carolina             45
5 2013       Tennessee              51          Texas              6
like image 68
Warner Avatar answered Sep 28 '22 02:09

Warner