I have two tables with data about people:
df1 <- data.frame(id=c(113,202,377,288,359),
name=c("Alex","Silvia","Peter","Jack","Jonny"))
Which provides me with
id name
1 113 Alex
2 202 Silvia
3 377 Peter
4 288 Jack
5 359 Jonny
And I have a second table containing the names of their family members:
df2 <- data.frame(id=c(113,113,113,202,202,359,359,359,359),
family.members=c("Ross","Jefferson","Max","Jo","Michael","Jimmy","Rex","Bill","Larry"))
This provides me with:
> df2
id family.members
1 113 Ross
2 113 Jefferson
3 113 Max
4 202 Jo
5 202 Michael
6 359 Jimmy
7 359 Rex
8 359 Bill
9 359 Larry
Now I want to extend table 1 with an additional column containing the sum of family members for each person:
id name no.family.memebers
1 113 Alex 3
2 202 Silvia 2
3 377 Peter 0
4 288 Jack 0
5 359 Jonny 4
What is the best way to create the third table in R?
Thank you very much in advance!
df1 <- df1[order(df1$id), ] # Just to be safe
# the counts vector will be ordered by df2$id
counts <- with (df2, tapply(family.members, id, length))
df1$no.family.members[df1$id %in% names(counts)]<- counts
df1
id name no.family.members
1 113 Alex 3
2 202 Silvia 2
4 288 Jack NA
5 359 Jonny 4
3 377 Peter NA
(I think NA is a more informative result than 0.)
Using dplyr
library(dplyr)
df1 <- df1 %>% left_join((
df2 %>% group_by(id) %>%
summarize(no.family.members = n())
)
)
With dplyr
>= 0.3.0.2 it could be rewritten as
df3 <- df1 %>% left_join(df2 %>% count(id))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With