Add frequency based on duplicate row

Question

I have a dataframe, dd

     Var1 Freq
76   2189- 1181
458  6186-  813
445  6170-  738
902  61801  650
74   2181-  618
504  6268-  509
905  61804  307
500  6259-  272
910  61889  265
495  6253-  242
73   2180-  224
510  6256- 6407
461  6180- 3254
792        2333
1          2312
467  6186- 2259
212  4019- 1254
4561 6170- 1162
462  6181- 1156
80   2189- 1154
465  6184- 1035

I would like to search for duplicates in the Var1 column and combine them, creating a sum of the frequencies int he Freq column such that...

     Var1 Freq
76   2189- 2335
458  6186- 3072
445  6170- 1900
902  61801  650
74   2181-  618
504  6268-  509
905  61804  307
500  6259-  272
910  61889  265
495  6253-  242
73   2180-  224
510  6256- 6407
461  6180- 3254
792        2333
1          2312
212  4019- 1254
462  6181- 1156
465  6184- 1035

NOTE: So you don't have to go searching for the differences, the Var1 values combined were 2189-, 6186-, & 6170-.

I imagine this can be done with some fancy work with [] and duplicated(), but I can't seem to wrap my head around it. I'd appreciate any guidance.

The dput for this data can be found on pastebin.

David Arenburg · Accepted Answer

This could be done with simply

aggregate(Freq ~ Var1, dd, sum)

Or with data.table

library(data.table)
setDT(dd)[, .(Freq = sum(Freq)), by = Var1]

Or with dplyr

library(dplyr)
dd %>%
  group_by(Var1) %>%
  summarise(Freq = sum(Freq))

though this will add up the empty entries of "" too, which is not clear why you treat them differently

Either way, with data.table you could reach your exact output using

setDT(dd)[, if(Var1 == "") .SD else sum(Freq), by = Var1]
#      Var1   V1
#  1: 2189- 2335
#  2: 6186- 3072
#  3: 6170- 1900
#  4: 61801  650
#  5: 2181-  618
#  6: 6268-  509
#  7: 61804  307
#  8: 6259-  272
#  9: 61889  265
# 10: 6253-  242
# 11: 2180-  224
# 12: 6256- 6407
# 13: 6180- 3254
# 14:       2333
# 15:       2312
# 16: 4019- 1254
# 17: 6181- 1156
# 18: 6184- 1035

Add frequency based on duplicate row

Tags:

r

Ryan

1 Answers

David Arenburg

Recent Activity

Donate For Us

Add frequency based on duplicate row

Tags:

r

Ryan

1 Answers

David Arenburg

Related questions

Recent Activity

Donate For Us