Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using group_by and mutate in dplyr package to create new factor variable by id variable

Tags:

r

dplyr

I have a hierarchical data frame in long format, where each row represents relationships, of which many can belong to a single person. Here is code for a small example dataset:

df <- data.frame(id = as.factor(c(1,1,1,2,2,3,4,4)),
             partner = c(1,2,3,1,2,1,1,2),
             kiss = as.factor(c("Yes", "No", "No", "No", "No", "Yes", "Yes", "No")))

  id partner kiss
1  1       1  Yes
2  1       2   No
3  1       3   No
4  2       1   No
5  2       2   No
6  3       1  Yes
7  4       1  Yes
8  4       2   No

I want to create a new factor variable in this dataset that indicates whether the person (indicated by the 'id variable) never kissed any of their 'partners'. In other words, if the person had a kiss with any of their partners the new variable would indicate 'Yes' — they never had a kiss with any partner. Here is what I think it should look like:

  id partner kiss neverkiss
1  1       1  Yes        No
2  1       2   No        No
3  1       3   No        No
4  2       1   No       Yes
5  2       2   No       Yes
6  3       1  Yes        No
7  4       1  Yes        No
8  4       2   No        No

Ideally, I would like to find a way to create this variable without reshaping the dataset. I also prefer to use the dplyr package. So far, I've thought about using the group_by, and mutate functions in this package to create this variable. However, i'm not sure what helper functions I can use to create my specific variable. I'm open to other ideas outside of the dplyr package, but that would be first prize for me.

like image 558
RNB Avatar asked Dec 10 '22 19:12

RNB


2 Answers

This should do it

require(dplyr)

df <- data.frame(id = as.factor(c(1,1,1,2,2,3,4,4)),
             partner = c(1,2,3,1,2,1,1,2),
             kiss = as.factor(c("Yes", "No", "No", "No", "No", "Yes", "Yes", "No")))

df_new <- df %>% 
   group_by(id) %>% 
   mutate("neverkiss" = {if (any(kiss == "Yes")) "No" else "Yes"})

df_new

If the new column should contain factors you have to ungroup first

df_new <- df %>% 
   group_by(id) %>% 
   mutate("neverkiss" = {if (any(kiss == "Yes")) "No" else "Yes"}) %>% 
   ungroup() %>% 
   mutate("neverkiss" = as.factor(neverkiss))

class(df_new$neverkiss)
[1] "factor"

The reason is that factors cant be combined:

a <- as.factor(c("Yes", "Yes", "Yes"))
b <- as.factor(c("No", "No", "No")) 

c(a, b) # meaningless

As grouping is still active mutate is basically building the vector neverkiss as a combination of vectors for each id (group) which results in a vector of just one level (in this case "No").

like image 160
Manuel R Avatar answered May 24 '23 06:05

Manuel R


We can also do it with data.table

library(data.table)
setDT(df)[, neverkiss := if(any(kiss=="Yes")) "No" else "Yes" , id]
like image 21
akrun Avatar answered May 24 '23 05:05

akrun