Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to match data from two tables with same primary key in R

Tags:

dataframe

r

I have two tables with data about people:

df1 <- data.frame(id=c(113,202,377,288,359),
                  name=c("Alex","Silvia","Peter","Jack","Jonny"))

Which provides me with

   id   name
1 113   Alex
2 202 Silvia
3 377  Peter
4 288   Jack
5 359  Jonny

And I have a second table containing the names of their family members:

df2 <- data.frame(id=c(113,113,113,202,202,359,359,359,359),
                 family.members=c("Ross","Jefferson","Max","Jo","Michael","Jimmy","Rex","Bill","Larry"))

This provides me with:

> df2
   id family.members
1 113           Ross
2 113      Jefferson
3 113            Max
4 202             Jo
5 202        Michael
6 359          Jimmy
7 359            Rex
8 359           Bill
9 359          Larry

Now I want to extend table 1 with an additional column containing the sum of family members for each person:

    id   name no.family.memebers
1  113   Alex                  3
2  202 Silvia                  2
3  377  Peter                  0
4  288   Jack                  0
5  359  Jonny                  4

What is the best way to create the third table in R?

Thank you very much in advance!

like image 584
jeffrey Avatar asked Nov 06 '14 20:11

jeffrey


2 Answers

 df1 <- df1[order(df1$id), ]  # Just to be safe
 # the counts vector will be ordered by df2$id
 counts <- with (df2, tapply(family.members, id, length))
 df1$no.family.members[df1$id %in% names(counts)]<- counts
 df1
   id   name no.family.members
1 113   Alex                 3
2 202 Silvia                 2
4 288   Jack                NA
5 359  Jonny                 4
3 377  Peter                NA

(I think NA is a more informative result than 0.)

like image 22
IRTFM Avatar answered Sep 25 '22 05:09

IRTFM


Using dplyr

library(dplyr)
df1 <- df1 %>% left_join((
    df2 %>% group_by(id) %>%
    summarize(no.family.members = n())
    )
)

With dplyr >= 0.3.0.2 it could be rewritten as

df3 <- df1 %>% left_join(df2 %>% count(id))
like image 101
Gregor Thomas Avatar answered Sep 24 '22 05:09

Gregor Thomas