I have a hierarchical data frame in long format, where each row represents relationships, of which many can belong to a single person. Here is code for a small example dataset:
df <- data.frame(id = as.factor(c(1,1,1,2,2,3,4,4)),
partner = c(1,2,3,1,2,1,1,2),
kiss = as.factor(c("Yes", "No", "No", "No", "No", "Yes", "Yes", "No")))
id partner kiss
1 1 1 Yes
2 1 2 No
3 1 3 No
4 2 1 No
5 2 2 No
6 3 1 Yes
7 4 1 Yes
8 4 2 No
I want to create a new factor variable in this dataset that indicates whether the person (indicated by the 'id variable) never kissed any of their 'partners'. In other words, if the person had a kiss with any of their partners the new variable would indicate 'Yes' — they never had a kiss with any partner. Here is what I think it should look like:
id partner kiss neverkiss
1 1 1 Yes No
2 1 2 No No
3 1 3 No No
4 2 1 No Yes
5 2 2 No Yes
6 3 1 Yes No
7 4 1 Yes No
8 4 2 No No
Ideally, I would like to find a way to create this variable without reshaping the dataset. I also prefer to use the dplyr package. So far, I've thought about using the group_by, and mutate functions in this package to create this variable. However, i'm not sure what helper functions I can use to create my specific variable. I'm open to other ideas outside of the dplyr package, but that would be first prize for me.
This should do it
require(dplyr)
df <- data.frame(id = as.factor(c(1,1,1,2,2,3,4,4)),
partner = c(1,2,3,1,2,1,1,2),
kiss = as.factor(c("Yes", "No", "No", "No", "No", "Yes", "Yes", "No")))
df_new <- df %>%
group_by(id) %>%
mutate("neverkiss" = {if (any(kiss == "Yes")) "No" else "Yes"})
df_new
If the new column should contain factors you have to ungroup
first
df_new <- df %>%
group_by(id) %>%
mutate("neverkiss" = {if (any(kiss == "Yes")) "No" else "Yes"}) %>%
ungroup() %>%
mutate("neverkiss" = as.factor(neverkiss))
class(df_new$neverkiss)
[1] "factor"
The reason is that factors cant be combined:
a <- as.factor(c("Yes", "Yes", "Yes"))
b <- as.factor(c("No", "No", "No"))
c(a, b) # meaningless
As grouping is still active mutate
is basically building the vector neverkiss
as a combination of vectors for each id
(group) which results in a vector of just one level (in this case "No").
We can also do it with data.table
library(data.table)
setDT(df)[, neverkiss := if(any(kiss=="Yes")) "No" else "Yes" , id]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With