I have two dataframes. In the first one, I have a KEY/ID column and two variables:
KEY V1 V2
1 10 2
2 20 4
3 30 6
4 40 8
5 50 10
In the second dataframe, I have a KEY/ID column and a third variable
KEY V3
1 5
2 10
3 20
I would like to extract the rows of the first dataframe that are also in the second dataframe by matching them according to the KEY column. I would also like to add the V3 column to final dataset.
KEY V1 V2 V3
1 10 2 5
2 20 4 10
3 30 6 20
This are my attempts by using the subset and the merge function
subset(data1, data1$KEY == data2$KEY)
merge(data1, data2, by.x = "KEY", by.y = "KEY")
None of them does the task.
Any hint would be appreaciated. Thank you!
We can join columns from two Dataframes using the merge() function. This is similar to the SQL 'join' functionality. A detailed discussion of different join types is given in the SQL lesson. You specify the type of join you want using the how parameter.
When we concatenate DataFrames, we need to specify the axis. axis=0 tells pandas to stack the second DataFrame UNDER the first one. It will automatically detect whether the column names are the same and will stack accordingly. axis=1 will stack the columns in the second DataFrame to the RIGHT of the first DataFrame.
merge(data1, data2, by="KEY")
should do it!
If what you want is an inner join, then your attempt should do it. If it doesn't check the formats of Key columns in both the table using class(data1$key)
.
Apart from these and the merge suggested by Christian, you can use -
library(plyr)
join(data1, data2, by="KEY", type="inner")
or
library(data.table)
setkey(data1, KEY)
setkey(data2, KEY)
data1[,list(data1,data2)]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With