How to group similar rows in R

Question

I have a table as follows:

   Rptname     Score

    Bebo23        8
    Bebo22        9
    Bebo19        10
    Alt88         12
    Alt67         11
    Jimm          5
    Jimm2         7

etc. I would like to sum into groups those rows that are similar. ie

     Bebo         27
     Alt          22
     Jimm         12

The beginning of the row name is always the similar part to group on but the number of characters that are similar can vary. I appreciate that I will have to define the groups and probably using some kind of regular expression but I'm not sure how to group and sum on this basis. Thanks for your help in advance

akrun · Accepted Answer

You can strip off the numbers at the end using sub and do aggregate

do.call(`data.frame`, aggregate(Score~cbind(Rptname=sub('\d+$', '', 
                        Rptname)), df, sum))
#    Rptname Score
#1     Alt    23
#2    Bebo    27
#3    Jimm    12

Or use transform with aggregate (as suggested by @docendo discimus)

aggregate(Score ~ Rptname, transform(df, Rptname = sub("\d+$", 
                        "", Rptname)), sum)

Or an option with data.table

library(data.table)
 setDT(df)[, .(Score=sum(Score)),
           by=list(Rptname=sub('\d+$','', Rptname))]

Or using rowsum (suggested by @alexis_laz

with(df, rowsum(Score, sub('\d+$', '', Rptname)))
#     [,1]
#Alt    23
#Bebo   27
#Jimm   12

Update

If the grouping is based on first three characters, you can use substr

aggregate(Score~Rptname, transform(df, Rptname=substr(Rptname, 1,3)), sum)
#   Rptname Score
#1     Alt    23
#2     Beb    27
#3     Jim    12

talat · Answer

With dplyr:

library(dplyr)
DF %>% group_by(Rptname = sub("\d+$", "", Rptname)) %>% summarise(Score = sum(Score))
#Source: local data frame [3 x 2]
#
#  Rptname Score
#1     Alt    23
#2    Bebo    27
#3    Jimm    12

Update:

If you want to group by the first three letters in "Rptname", you can use the following code in dplyr:

DF %>% group_by(Rptname = substr(Rptname, 1, 3)) %>% summarise(Score = sum(Score))
#Source: local data frame [3 x 2]
#
#  Rptname Score
#1     Alt    23
#2     Beb    27
#3     Jim    12

How to group similar rows in R

Tags:

r

grouping

Sebastian Zeki

Video Answer

2 Answers

Update

akrun

talat

Recent Activity

Donate For Us

How to group similar rows in R

Tags:

r

grouping

Sebastian Zeki

Video Answer

2 Answers

Update

akrun

talat

Related questions

Recent Activity

Donate For Us