Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Dplyr solution for difference in row values based on two factor levels in separate columns

Tags:

dataframe

r

dplyr

I am trying to use dplyr to calculate the difference between two row values based on factor levels in large data frame. In practical terms, I want the vote distance between two groups across each party within each country. For the data below, I would like to end up with a data frame with rows indicating the difference between the vote values for each group pair for each party level within each country level. The lag function does not seem to work with my data as the number of factor levels varies by country, meaning each country has a different total number of groups and parties. A small sample of the setup is below.

df1 <- data.frame(id = c(1:12),
                 country = c("a","a","a","a","a","a","b","b","b","b","b","b"),
                 group =   c("x","y","z","x","y","z","x","y","z","x","y","z"),
                 party =   c("d","d","d","e","e","e","d","d","d","e","e","e"),
                 vote =    c(.15,.02,.7, .5, .6, .22,.47,.33,.09,.83,.77,.66))

This is how I would like the end product to look.

df2 <- data.frame(id= c(1:12),
                  country = c("a","a","a","b","b","b","a","a","a","b","b","b"),
                  group1 =  c("x","x","y","x","x","y","x","x","y","x","x","y"),
                  group2 =  c("y","z","z","y","z","z","y","z","z","y","z","z"),
                  party =   c("d","d","d","d","d","d","e","e","e","e","e","e"),
                  dist =  c(.13,-.5,-.68,.14,.38,.24,-.1,.28,.38,.06,.17,.11))

I have tried dcast previously and if I fill with the column I want, it doesn't line up and produces NA or 0 where there should be values. The lag function doesn't work in my case because the number of parties and groups are unique for each country and not fixed. Whenever I have tried different intervals for the lag the values are comparing across countries of across parties rather than across groups in some instances.

I have found solutions outside of dplyr but for parsimony in presenting code I am wondering if there is a way in dplyr. Also, the code I have is incredibly long and clunky, and uses six or seven packages just for this problem.

Thanks

like image 551
shay Avatar asked Mar 04 '23 00:03

shay


1 Answers

We can use combn to create the difference

library(dplyr)
df1 %>%
    group_by(country,  party) %>% 
    mutate(dist = combn(vote, 2, FUN = function(x) x[1] - x[2]))
like image 159
akrun Avatar answered Mar 05 '23 16:03

akrun