I'm using dplyrs left join function in order to match two dataframes.
I have a panel data set A which consists of 4708 rows and 2 columns ID and Name:
ID Name
1 Option1
1 Option2
1 Option3
2 Option2
2 Option3
3 Option1
3 Option4
My dataset B consists of single definitions and categories for each name column (86 rows):
Name Definition Category
Option1 Def1 1
Option2 Def2 1
Option3 Def2 2
Option4 Def3 2
So in the end I need following data set C which links the columns of B to A:
ID Name Definition Category
1 Option1 Def1 1
1 Option2 Def2 1
1 Option3 Def2 2
2 Option2 Def2 1
2 Option3 Def2 2
3 Option1 Def1 1
3 Option4 Def3 2
I used a left_join command in dplyr to do this:
Data C <- left_join(A,B, by="name")
However, for some reason I got 5355 rows instead of the original 4708, so rows were some added. My understanding was that left_join simply assigns the definitions & categories of B to data set A.
Why do I get more rows ? Or are there any other ways to get the desired data frame C?
Joins with dplyr. The dplyr package uses SQL database syntax for its join functions. A left join means: Include everything on the left (what was the x data frame in merge() ) and all rows that match from the right (y) data frame. If the join columns have the same name, all you need is left_join(x, y) .
How do I join multiple dataframes in R using dplyr ? This is how you join multiple data sets in R usually. You can use left_join instead of merge if you like. Use Reduce(function(dtf1,dtf2) left_join(dtf1,dtf2,by="index"), list(x,y,z)) .
How to Join Multiple Data Frames in R?, you can find it useful to connect many data frames in R. Fortunately, the left join() function from the dplyr package makes this simple to accomplish. We can easily conduct two left joins, one after the other, to combine all three data frames. connect the three data frames.
With left_join(A, B)
new rows will be added wherever there are multiple rows in B
for which the key columns (same-name columns by default) match the same, single row in A
. For example:
library(dplyr)
df1 <- data.frame(col1 = LETTERS[1:4],
col2 = 1:4)
df2 <- data.frame(col1 = rep(LETTERS[1:2], 2),
col3 = 4:1)
left_join(df1, df2) # has 6 rows rather than 4
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With