Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

In R, collapse over multiple logical rows of the same ID into 1 row

Problem: To do some survey analysis on prescription drug use in R, I need to turn multiple rows of the same person (ID) into one, indicating TRUE if any of said rows has TRUE in it.

Here's the data:

df <- data.frame(ID = c("a","a","a","a","a","a"), 
                cardiovasc = c(T,T,T,T,T,T), 
                beta_blockers = c(F,F,F,F,F,F),
                antibiotics = c(T,F,F,F,F,F),
                stringsAsFactors=FALSE)

Here's what I'd like it to look like:

goal <- data.frame(ID = c("a"),
                    cardiovasc = c(T), 
                    beta_blockers = c(F),
                    antibiotics = c(T),
                    stringsAsFactors=FALSE)

As you can tell, even though df$antibiotics only had 1 TRUE in the dataset, I'd like that to count as TRUE when the ID has been collapsed into one row.

What I've tried:

Mainly, I've been trying to work off this post, and while I feel I'm close, I nevertheless get an error. Here's my attempt:

df <- df[, lapply(.SD, paste0, collapse=""), by=ID]

Which yields unused argument (by = ID). I've tried another approach from the same post, but that's even messier and requires me to make the data a data.table. I need to keep things as a data.frame.

Any ideas?

like image 565
logjammin Avatar asked Dec 31 '22 14:12

logjammin


2 Answers

We can use any instead of paste as any will check for any TRUE elements in the column, grouped by 'ID'

library(data.table)
setDT(df)[, lapply(.SD, any), ID]

-output

#   ID cardiovasc beta_blockers antibiotics
#1:  a       TRUE         FALSE        TRUE
like image 171
akrun Avatar answered Apr 30 '23 20:04

akrun


Or you can use this tidyverse solution:

library(dplyr)

df %>%
  group_by(ID) %>%
  summarise(across(cardiovasc:antibiotics, ~ any(.x)))

# A tibble: 1 x 4
  ID    cardiovasc beta_blockers antibiotics
  <chr> <lgl>      <lgl>         <lgl>      
1 a     TRUE       FALSE         TRUE

Updated Thank you dear @Ray for bringing up a very likely scenario: In case the column values were 1 & 0 instead of TRUE & FALSE and also taking into account the presence of NA values among them we could use the following solution:

df %>%
  group_by(ID) %>%
  summarise(across(cardiovasc:antibiotics, ~ any(.x[!is.na(.x)] == 1)))

# A tibble: 1 x 4
  ID    cardiovasc beta_blockers antibiotics
  <chr> <lgl>      <lgl>         <lgl>      
1 a     TRUE       FALSE         TRUE  

Data

df <- data.frame(ID = c("a","a","a","a","a","a"), 
                 cardiovasc = c(1,1,1,1,1,1), 
                 beta_blockers = c(0,0,0,0,0,0),
                 antibiotics = c(1,0,0,0,0,0),
                 stringsAsFactors=FALSE)
like image 32
Anoushiravan R Avatar answered Apr 30 '23 21:04

Anoushiravan R