Count the amount of times value A occurs without value B and vice versa

Question

I'm having trouble figuring out how to do the opposite of the answer to this question (and in R not python).

Count the amount of times value A occurs with value B

Basically I have a dataframe with a lot of combinations of pairs of columns like so:

df <- data.frame(id1 = c("1","1","1","1","2","2","2","3","3","4","4"),
                 id2 = c("2","2","3","4","1","3","4","1","4","2","1"))

I want to count, how often all the values in column A occur in the whole dataframe without the values from column B. So the results for this small example would be the output of:

df_result <- data.frame(id1 = c("1","1","1","2","2","2","3","3","4","4"),
                        id2 = c("2","3","4","1","3","4","1","4","2","1"),
                        count = c("4","5","5","3","5","4","2","3","3","3"))

The important criteria for this, is that the final results dataframe is collapsed by the pairs (so in my example rows 1 and 2 are duplicates, and they are collapsed and summed by the total frequency 1 is observed without 2). For tallying the count of occurances, it's important that both columns are examined. I.e. order of columns doesn't matter for calculating the frequency - if column A has 1 and B has 2, this counts the same as if column A has 2 and B has 1.

I can do this very slowly by filtering for each pair, but it's not really feasible for my real data where I have many many different pairs.

Any guidance is greatly appreciated.

benson23 · Accepted Answer

First paste the two id columns together to id12 for later matching. Then use sapply to go through all rows to see the records where id1 appears in id12 but id2 doesn't. sum that value and only output the distinct records. Finally, remove the id12 column.

library(dplyr)

df %>% mutate(id12 = paste0(id1, id2),
              count = sapply(1:nrow(.), 
                             function(x) 
                               sum(grepl(id1[x], id12) & !grepl(id2[x], id12)))) %>% 
  distinct() %>% 
  select(-id12)

Or in base R completely:

id12 <- paste0(df$id1, df$id2)
df$count <- sapply(1:nrow(df), function(x) sum(grepl(df$id1[x], id12) & !grepl(df$id2[x], id12)))
df <- df[!duplicated(df),]

Output

   id1 id2 count
1    1   2     4
2    1   3     5
3    1   4     5
4    2   1     3
5    2   3     5
6    2   4     4
7    3   1     2
8    3   4     3
9    4   2     3
10   4   1     3

Maël · Answer

A full tidyverse version:

library(tidyverse)
df %>% 
  mutate(id = paste(id1, id2),
         count = map(cur_group_rows(), ~ sum(str_detect(id, id1[.x]) & str_detect(id, id2[.x], negate = T))))

Count the amount of times value A occurs without value B and vice versa

Tags:

r

dplyr

tidyverse

cebola

2 Answers

Output

benson23

Maël

Recent Activity

Donate For Us

Count the amount of times value A occurs without value B and vice versa

Tags:

r

dplyr

tidyverse

cebola

2 Answers

Output

benson23

Maël

Related questions

Recent Activity

Donate For Us