I have a large dataset that has duplicate times (rows) with data in both row columns that I would like to combine. The data looks like this:
date P1 PT1 P2 PT2 P3 PT3
5/5/2011@11:40 NA NA NA NA 9.4 10.1
5/5/2011@11:40 5.6 10.2 8.5 10.1 NA NA
I would like to get to this
date P1 PT1 P2 PT2 P3 PT3
5/5/2011@11:40 5.6 10.2 8.5 10.1 9.4 10.1
My dataset is 10 minutes data for ten years and the repeats are somewhat random. The @ sign was added to display properly.
I've tried rbind and rbind.row.names to no avail.
Thanks!
You can use the summarize() function in dplyr. The following will work, but it does not check for duplicates, it only takes the maximum value for each date.
library(dplyr)
df <- tribble(~date, ~P1, ~PT1, ~P2, ~PT2, ~P3, ~PT3,
"5/5/2011@11:40", NA, NA, NA, NA, 9.4, 10.1,
"5/5/2011@11:40", 5.6, 10.2, 8.5, 10.1, NA, NA
)
df %>%
group_by(date) %>%
summarize(across(starts_with("P"), max, na.rm = TRUE))
In other words, if you are sure that your data include either a number or NA, then it will work.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With