I have a dataframe, dd
Var1 Freq
76 2189- 1181
458 6186- 813
445 6170- 738
902 61801 650
74 2181- 618
504 6268- 509
905 61804 307
500 6259- 272
910 61889 265
495 6253- 242
73 2180- 224
510 6256- 6407
461 6180- 3254
792 2333
1 2312
467 6186- 2259
212 4019- 1254
4561 6170- 1162
462 6181- 1156
80 2189- 1154
465 6184- 1035
I would like to search for duplicates in the Var1 column and combine them, creating a sum of the frequencies int he Freq column such that...
Var1 Freq
76 2189- 2335
458 6186- 3072
445 6170- 1900
902 61801 650
74 2181- 618
504 6268- 509
905 61804 307
500 6259- 272
910 61889 265
495 6253- 242
73 2180- 224
510 6256- 6407
461 6180- 3254
792 2333
1 2312
212 4019- 1254
462 6181- 1156
465 6184- 1035
NOTE: So you don't have to go searching for the differences, the Var1 values combined were 2189-, 6186-, & 6170-.
I imagine this can be done with some fancy work with [] and duplicated(), but I can't seem to wrap my head around it. I'd appreciate any guidance.
The dput for this data can be found on pastebin.
This could be done with simply
aggregate(Freq ~ Var1, dd, sum)
Or with data.table
library(data.table)
setDT(dd)[, .(Freq = sum(Freq)), by = Var1]
Or with dplyr
library(dplyr)
dd %>%
group_by(Var1) %>%
summarise(Freq = sum(Freq))
though this will add up the empty entries of "" too, which is not clear why you treat them differently
Either way, with data.table you could reach your exact output using
setDT(dd)[, if(Var1 == "") .SD else sum(Freq), by = Var1]
# Var1 V1
# 1: 2189- 2335
# 2: 6186- 3072
# 3: 6170- 1900
# 4: 61801 650
# 5: 2181- 618
# 6: 6268- 509
# 7: 61804 307
# 8: 6259- 272
# 9: 61889 265
# 10: 6253- 242
# 11: 2180- 224
# 12: 6256- 6407
# 13: 6180- 3254
# 14: 2333
# 15: 2312
# 16: 4019- 1254
# 17: 6181- 1156
# 18: 6184- 1035
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With