Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Left Join in R (dplyr) - Too many observations?

I'm using dplyrs left join function in order to match two dataframes.

I have a panel data set A which consists of 4708 rows and 2 columns ID and Name:

ID Name
1  Option1
1  Option2
1  Option3
2  Option2
2  Option3
3  Option1
3  Option4

My dataset B consists of single definitions and categories for each name column (86 rows):

Name        Definition  Category
Option1     Def1         1
Option2     Def2         1
Option3     Def2         2
Option4     Def3         2

So in the end I need following data set C which links the columns of B to A:

ID Name      Definition   Category
1  Option1   Def1         1
1  Option2   Def2         1
1  Option3   Def2         2
2  Option2   Def2         1
2  Option3   Def2         2
3  Option1   Def1         1
3  Option4   Def3         2

I used a left_join command in dplyr to do this:

Data C <- left_join(A,B, by="name")

However, for some reason I got 5355 rows instead of the original 4708, so rows were some added. My understanding was that left_join simply assigns the definitions & categories of B to data set A.

Why do I get more rows ? Or are there any other ways to get the desired data frame C?

like image 214
Kosta S. Avatar asked Mar 13 '18 13:03

Kosta S.


People also ask

How do I use left join in dplyr?

Joins with dplyr. The dplyr package uses SQL database syntax for its join functions. A left join means: Include everything on the left (what was the x data frame in merge() ) and all rows that match from the right (y) data frame. If the join columns have the same name, all you need is left_join(x, y) .

How do I left join multiple data frames in R?

How do I join multiple dataframes in R using dplyr ? This is how you join multiple data sets in R usually. You can use left_join instead of merge if you like. Use Reduce(function(dtf1,dtf2) left_join(dtf1,dtf2,by="index"), list(x,y,z)) .

Can you left join multiple tables in R?

How to Join Multiple Data Frames in R?, you can find it useful to connect many data frames in R. Fortunately, the left join() function from the dplyr package makes this simple to accomplish. We can easily conduct two left joins, one after the other, to combine all three data frames. connect the three data frames.


1 Answers

With left_join(A, B) new rows will be added wherever there are multiple rows in B for which the key columns (same-name columns by default) match the same, single row in A. For example:

library(dplyr)
df1 <- data.frame(col1 = LETTERS[1:4],
                  col2 = 1:4)
df2 <- data.frame(col1 = rep(LETTERS[1:2], 2),
                  col3 = 4:1)

left_join(df1, df2)  # has 6 rows rather than 4
like image 107
Jordi Avatar answered Sep 22 '22 01:09

Jordi