Conditional grouping and summarizing data frame in [R]

Question

I have a data frame like this:

df <- data.frame(ID = c("A", "A", "B", "B", "C", "C"), 
                 time = c(3.1,3.2,6.5,12.3, 3.2, 3.4), 
                 intensity = c(10, 20, 30, 40, 50, 60))

|ID | time| intensity|
|:--|----:|---------:|
|A  |  3.1|        10|
|A  |  3.2|        20|
|B  |  6.5|        30|
|B  | 12.3|        40|
|C  |  3.2|        50|
|C  |  3.4|        60|

I would like to aggregate values (sum intensities) by ID only when time difference is less than, i.e. 0.3. First I calculated this time difference:

df.2 <- df %>% 
        group_by(ID) %>% 
        mutate(time.diff = max(time) - min(time))

...resulting in:

|ID | time| intensity| time.diff|
|:--|----:|---------:|---------:|
|A  |  3.1|        10|       0.1|
|A  |  3.2|        20|       0.1|
|B  |  6.5|        30|       5.8|
|B  | 12.3|        40|       5.8|
|C  |  3.2|        50|       0.2|
|C  |  3.4|        60|       0.2|

Just to be clear, what I would like to get as an output would be:

|ID | time| intensity| time.diff|
|:--|----:|---------:|---------:|
|A  | 3.15|        30|       0.1|
|B  |  6.5|        30|       5.8|
|B  | 12.3|        40|       5.8|
|C  |  3.3|       110|       0.2|

where time now is an average of the integrated observations, and intensity is the sum of them. The ID "B" keeps two observations, since its time difference is bigger than 0.3. I have tried with dplyr, but summarise will always drop one of the observations of "B", and I want to keep them, and I don't know how to do a conditional _group_by_.

I thank you for any idea!!

akrun · Accepted Answer

A possible option with data.table

library(data.table)
unique(setDT(df)[, time.diff := max(time)-min(time), ID][
   time.diff <= 0.3, c('time', 'intensity') := list(mean(time),
        sum(intensity)), ID]) 
#    ID  time intensity time.diff
#1:  A  3.15        30       0.1
#2:  B  6.50        30       5.8
#3:  B 12.30        40       5.8
#4:  C  3.30       110       0.2

Or using dplyr

library(dplyr)
df %>% 
   group_by(ID) %>%
   mutate(time.diff=max(time)-min(time), indx=all(time.diff<=0.3),
         intensity=ifelse(indx, sum(intensity), intensity),
         time=ifelse(indx, mean(time), time)) %>% 
   filter(!indx|row_number()==1) %>%
   select(-indx)
 #  ID  time intensity time.diff
 #1  A  3.15        30       0.1
 #2  B  6.50        30       5.8
 #3  B 12.30        40       5.8
 #4  C  3.30       110       0.2

Conditional grouping and summarizing data frame in [R]

Tags:

r

dplyr

mesontau

1 Answers

akrun

Recent Activity

Donate For Us

Conditional grouping and summarizing data frame in [R]

Tags:

r

dplyr

mesontau

1 Answers

akrun

Related questions

Recent Activity

Donate For Us