Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R function to combine rows based on duplicate times

I have a large dataset that has duplicate times (rows) with data in both row columns that I would like to combine. The data looks like this:

date              P1   PT1  P2   PT2   P3   PT3

5/5/2011@11:40    NA   NA   NA   NA   9.4   10.1

5/5/2011@11:40    5.6  10.2  8.5 10.1  NA   NA

I would like to get to this

date                P1     PT1     P2    PT2    P3    PT3

5/5/2011@11:40    5.6  10.2  8.5 10.1  9.4   10.1

My dataset is 10 minutes data for ten years and the repeats are somewhat random. The @ sign was added to display properly.

I've tried rbind and rbind.row.names to no avail.

Thanks!

like image 407
schultz45 Avatar asked Mar 03 '26 19:03

schultz45


1 Answers

You can use the summarize() function in dplyr. The following will work, but it does not check for duplicates, it only takes the maximum value for each date.

library(dplyr)
df <- tribble(~date, ~P1, ~PT1, ~P2, ~PT2, ~P3, ~PT3, 
        "5/5/2011@11:40", NA, NA, NA, NA, 9.4, 10.1, 
        "5/5/2011@11:40", 5.6, 10.2, 8.5, 10.1, NA, NA
)

df %>%
    group_by(date) %>%
    summarize(across(starts_with("P"), max, na.rm = TRUE))

In other words, if you are sure that your data include either a number or NA, then it will work.

like image 197
mikebader Avatar answered Mar 05 '26 08:03

mikebader