Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Add frequency based on duplicate row

Tags:

r

I have a dataframe, dd

     Var1 Freq
76   2189- 1181
458  6186-  813
445  6170-  738
902  61801  650
74   2181-  618
504  6268-  509
905  61804  307
500  6259-  272
910  61889  265
495  6253-  242
73   2180-  224
510  6256- 6407
461  6180- 3254
792        2333
1          2312
467  6186- 2259
212  4019- 1254
4561 6170- 1162
462  6181- 1156
80   2189- 1154
465  6184- 1035

I would like to search for duplicates in the Var1 column and combine them, creating a sum of the frequencies int he Freq column such that...

     Var1 Freq
76   2189- 2335
458  6186- 3072
445  6170- 1900
902  61801  650
74   2181-  618
504  6268-  509
905  61804  307
500  6259-  272
910  61889  265
495  6253-  242
73   2180-  224
510  6256- 6407
461  6180- 3254
792        2333
1          2312
212  4019- 1254
462  6181- 1156
465  6184- 1035

NOTE: So you don't have to go searching for the differences, the Var1 values combined were 2189-, 6186-, & 6170-.

I imagine this can be done with some fancy work with [] and duplicated(), but I can't seem to wrap my head around it. I'd appreciate any guidance.

The dput for this data can be found on pastebin.

like image 953
Ryan Avatar asked Jun 28 '26 07:06

Ryan


1 Answers

This could be done with simply

aggregate(Freq ~ Var1, dd, sum)

Or with data.table

library(data.table)
setDT(dd)[, .(Freq = sum(Freq)), by = Var1]

Or with dplyr

library(dplyr)
dd %>%
  group_by(Var1) %>%
  summarise(Freq = sum(Freq))

though this will add up the empty entries of "" too, which is not clear why you treat them differently

Either way, with data.table you could reach your exact output using

setDT(dd)[, if(Var1 == "") .SD else sum(Freq), by = Var1]
#      Var1   V1
#  1: 2189- 2335
#  2: 6186- 3072
#  3: 6170- 1900
#  4: 61801  650
#  5: 2181-  618
#  6: 6268-  509
#  7: 61804  307
#  8: 6259-  272
#  9: 61889  265
# 10: 6253-  242
# 11: 2180-  224
# 12: 6256- 6407
# 13: 6180- 3254
# 14:       2333
# 15:       2312
# 16: 4019- 1254
# 17: 6181- 1156
# 18: 6184- 1035
like image 112
David Arenburg Avatar answered Jun 30 '26 21:06

David Arenburg