I have two data frames
distinct_paper_year_data:
author_id distinct_paper_year_count
1 3
2 1
4 1
5 4
author_data:
author_id paper_id confirmed
1 25733 1
2 47276 1
3 79468 1
4 12856 0
Now I want to merge so that the desired output looks like:
author_id paper_id confirmed distinct_paper_year_count
1 25733 1 3
2 47276 1 1
3 79468 1 0
4 12856 0 4
In this I need the author_id
s present in the table author_data
to be in the final output. Since there is no data for author_id==3
in distinct_paper_year_count, the value of the distinct_paper_year_count
column should be zero in the final result (for author_id==3
).
By using merge I am getting
merge(distinct_paper_year_data,author_data,by="author_id")
author_id distinct_paper_year_count paper_id confirmed
1 3 25733 1
2 1 47276 1
4 1 12856 0
How can the desired output be attained?
You need an outer join:
merge(distinct_paper_year_data,author_data,by="author_id", all=T)
NB: You'll get NA
for those rows where the tables don't match, like author_id in {3,5}. That said, you can simply modify the NAs if you need. You can also use all.x
or all.y
to do a left or right outer join.
Finally check out data.table
for faster joins (and more functionality)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With