I have a table as follows:
Rptname Score
Bebo23 8
Bebo22 9
Bebo19 10
Alt88 12
Alt67 11
Jimm 5
Jimm2 7
etc. I would like to sum into groups those rows that are similar. ie
Bebo 27
Alt 22
Jimm 12
The beginning of the row name is always the similar part to group on but the number of characters that are similar can vary. I appreciate that I will have to define the groups and probably using some kind of regular expression but I'm not sure how to group and sum on this basis. Thanks for your help in advance
You can strip off the numbers at the end using sub
and do aggregate
do.call(`data.frame`, aggregate(Score~cbind(Rptname=sub('\\d+$', '',
Rptname)), df, sum))
# Rptname Score
#1 Alt 23
#2 Bebo 27
#3 Jimm 12
Or use transform
with aggregate
(as suggested by @docendo discimus)
aggregate(Score ~ Rptname, transform(df, Rptname = sub("\\d+$",
"", Rptname)), sum)
Or an option with data.table
library(data.table)
setDT(df)[, .(Score=sum(Score)),
by=list(Rptname=sub('\\d+$','', Rptname))]
Or using rowsum
(suggested by @alexis_laz
with(df, rowsum(Score, sub('\\d+$', '', Rptname)))
# [,1]
#Alt 23
#Bebo 27
#Jimm 12
If the grouping is based on first three characters, you can use substr
aggregate(Score~Rptname, transform(df, Rptname=substr(Rptname, 1,3)), sum)
# Rptname Score
#1 Alt 23
#2 Beb 27
#3 Jim 12
With dplyr:
library(dplyr)
DF %>% group_by(Rptname = sub("\\d+$", "", Rptname)) %>% summarise(Score = sum(Score))
#Source: local data frame [3 x 2]
#
# Rptname Score
#1 Alt 23
#2 Bebo 27
#3 Jimm 12
Update:
If you want to group by the first three letters in "Rptname", you can use the following code in dplyr:
DF %>% group_by(Rptname = substr(Rptname, 1, 3)) %>% summarise(Score = sum(Score))
#Source: local data frame [3 x 2]
#
# Rptname Score
#1 Alt 23
#2 Beb 27
#3 Jim 12
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With