Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

merging in R keeping all rows of a data set

Tags:

merge

r

I have two data frames

distinct_paper_year_data:

author_id      distinct_paper_year_count
     1                         3
     2                         1
     4                         1
     5                         4 

author_data:

author_id    paper_id  confirmed
   1         25733         1
   2         47276         1
   3         79468         1
   4         12856         0

Now I want to merge so that the desired output looks like:

author_id  paper_id     confirmed    distinct_paper_year_count
 1            25733          1               3
 2            47276          1               1 
 3            79468          1               0  
 4            12856          0               4

In this I need the author_ids present in the table author_data to be in the final output. Since there is no data for author_id==3 in distinct_paper_year_count, the value of the distinct_paper_year_count column should be zero in the final result (for author_id==3).

By using merge I am getting

   merge(distinct_paper_year_data,author_data,by="author_id") 

author_id    distinct_paper_year_count paper_id confirmed
     1                         3       25733         1
     2                         1       47276         1
     4                         1       12856         0

How can the desired output be attained?

like image 652
user3171906 Avatar asked Mar 31 '14 06:03

user3171906


1 Answers

You need an outer join:

merge(distinct_paper_year_data,author_data,by="author_id", all=T)

NB: You'll get NA for those rows where the tables don't match, like author_id in {3,5}. That said, you can simply modify the NAs if you need. You can also use all.x or all.y to do a left or right outer join.

Finally check out data.table for faster joins (and more functionality)

like image 128
Michele Avatar answered Sep 29 '22 18:09

Michele