Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Conditional grouping and summarizing data frame in [R]

Tags:

r

dplyr

I have a data frame like this:

df <- data.frame(ID = c("A", "A", "B", "B", "C", "C"), 
                 time = c(3.1,3.2,6.5,12.3, 3.2, 3.4), 
                 intensity = c(10, 20, 30, 40, 50, 60))
|ID | time| intensity|
|:--|----:|---------:|
|A  |  3.1|        10|
|A  |  3.2|        20|
|B  |  6.5|        30|
|B  | 12.3|        40|
|C  |  3.2|        50|
|C  |  3.4|        60|

I would like to aggregate values (sum intensities) by ID only when time difference is less than, i.e. 0.3. First I calculated this time difference:

df.2 <- df %>% 
        group_by(ID) %>% 
        mutate(time.diff = max(time) - min(time)) 

...resulting in:

|ID | time| intensity| time.diff|
|:--|----:|---------:|---------:|
|A  |  3.1|        10|       0.1|
|A  |  3.2|        20|       0.1|
|B  |  6.5|        30|       5.8|
|B  | 12.3|        40|       5.8|
|C  |  3.2|        50|       0.2|
|C  |  3.4|        60|       0.2|

Just to be clear, what I would like to get as an output would be:

|ID | time| intensity| time.diff|
|:--|----:|---------:|---------:|
|A  | 3.15|        30|       0.1|
|B  |  6.5|        30|       5.8|
|B  | 12.3|        40|       5.8|
|C  |  3.3|       110|       0.2|

where time now is an average of the integrated observations, and intensity is the sum of them. The ID "B" keeps two observations, since its time difference is bigger than 0.3. I have tried with dplyr, but summarise will always drop one of the observations of "B", and I want to keep them, and I don't know how to do a conditional _group_by_.

I thank you for any idea!!

like image 661
mesontau Avatar asked Oct 20 '25 17:10

mesontau


1 Answers

A possible option with data.table

library(data.table)
unique(setDT(df)[, time.diff := max(time)-min(time), ID][
   time.diff <= 0.3, c('time', 'intensity') := list(mean(time),
        sum(intensity)), ID]) 
#    ID  time intensity time.diff
#1:  A  3.15        30       0.1
#2:  B  6.50        30       5.8
#3:  B 12.30        40       5.8
#4:  C  3.30       110       0.2

Or using dplyr

library(dplyr)
df %>% 
   group_by(ID) %>%
   mutate(time.diff=max(time)-min(time), indx=all(time.diff<=0.3),
         intensity=ifelse(indx, sum(intensity), intensity),
         time=ifelse(indx, mean(time), time)) %>% 
   filter(!indx|row_number()==1) %>%
   select(-indx)
 #  ID  time intensity time.diff
 #1  A  3.15        30       0.1
 #2  B  6.50        30       5.8
 #3  B 12.30        40       5.8
 #4  C  3.30       110       0.2
like image 61
akrun Avatar answered Oct 23 '25 06:10

akrun



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!