Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Count and filter data based on paired data/every two rows?

Trying to set up for a McNemar test, but I cannot code very well (using R)

My data is paired, and it is 1000 pairs long, so I have a column specifying the pair number like

 c(0 , 0 , 1, 1, 2, 2, 3, 3, 4, 4)

A column specifying which member of the pair is in the control group or treatment (each pair has one of each player, but in a random order) like:

c(0, 1, 1, 0, 1, 0, 0, 1, 0, 1)

And there is a column called response, in which neither, one, or both of the members of the pair could receive a response like:

c(0, 1, 1, 1, 1, 0, 0, 0, 0, 1)

I am trying to create a matrix counting up the results, like:

a <- count of pairs in which both members received a response
b <- count of pairs in which the control only received a response
c <- treatment only response
d <- Neither response
matrix(c(a, b, c, d), 2, 2)

What lines of code could I run to filter my data to get a, b, c, and d? I have been trying to use the tidyverse package, so it could be base R or tidyverse

like image 706
KVHelpMe Avatar asked Mar 06 '26 10:03

KVHelpMe


2 Answers

This approach with tidyverse/dplyr works:

1.Loading your data:

library(tidyverse)

pair <- c(0 , 0 , 1, 1, 2, 2, 3, 3, 4, 4)
treat <- c(0, 1, 1, 0, 1, 0, 0, 1, 0, 1)
response <- c(0, 1, 1, 1, 1, 0, 0, 0, 0, 1)
data <- data.frame(pair, treat, response)

2. Computing the counts you want:

d <- data %>% group_by(pair) %>%
    mutate(total_response = sum(response)) %>%
    ungroup() %>% mutate(a = case_when(
        total_response==2 ~ 1,
        TRUE ~ 0),
        b = case_when(
            total_response==1 & treat==0 & response == 1 ~ 1,
        TRUE ~ 0),
        c = case_when(
            total_response==1 & treat==1 & response == 1  ~ 1,
        TRUE ~ 0), 
        d = case_when(
            total_response == 0 ~ 1,
        TRUE ~ 0)) %>% group_by(pair) %>%
    summarise(a = max(a),
              b = max(b),
              c = max(c),
              d = max(d)) %>%
    ungroup() %>%
    summarise(a = sum(a),
              b = sum(b),
              c = sum(c),
              d = sum(d))

3. Your matrix:

matrix(c(d$a, d$b, d$c, d$d), 2, 2)

4. Explaining the computations:

  1. First, you sum responses grouped by pairs;
  2. Then, you ungroup, and when there are two responses by pair, a=1; when one response and control responded, b=1; when one response and treated responded, c=1; when no response, d=1;
  3. Then, you group by pairs again and get the max of each letter value, so you get only one letter value by pair;
  4. Finally, you ungroup and sum the ones for each variable (equivalent of counting the ones for each of them);
like image 191
LuizZ Avatar answered Mar 09 '26 02:03

LuizZ


Assume that your dataframe looks like this

> d
   group treatment response
1      0         0        0
2      0         1        1
3      1         1        1
4      1         0        1
5      2         1        1
6      2         0        0
7      3         0        0
8      3         1        0
9      4         0        0
10     4         1        1

Then you can try something like this

d <- within(d, {
  response <- factor(response, levels = c(1, 0), labels = c("positive", "negative"))
  treatment <- as.logical(treatment)
})

with(d, table(response[!treatment], response[treatment], dnn = c("control", "treatment")))

Output

          treatment
control    positive negative
  positive        1        0
  negative        3        1
like image 40
ekoam Avatar answered Mar 09 '26 02:03

ekoam