Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Combine rows and sum their values

Tags:

r

aggregate

Below there is a fraction of my original data frame. I need to combine those rows in which a specific id is repeated in a specific season and its lic and vessel are different. By combining I need sum qtty and grossTon.

Please take the id 431 in the season 1998 as an example (*).

season   lic     id  qtty   vessel   grossTon
…   
1998    16350   431  40     435       57
1998    16353   431  28     303       22.54
…   

The same subject 431 has two different lic (16350 & 16353) and two different vessels (435 & 303). The result expected in this specific case is:

  season     lic       id   qtty  vessel    grossTon
    …
    1998    16350      431   68     435     79.54
    …

I don't mind which lic and vessel remind in the resulting row, what I want is to keep season, id and the resulting sum of qtty and grossTon. In the above example I manually chose lic 16350 and vessel 435.

To be honest I have no idea what to do, so I'd much appreciate any help.

Thanks

Original data (*= rows to be combined)

season  lic id  qtty    vessel  grossTon
1998    15593   411 40  2643    31.5
1999    27271   411 40  2643    31.5
2000    35758   411 40  2643    31.5
2001    45047   411 50  2643    31.5
2002    56291   411 55  2643    31.5
2003    66991   411 55  2643    31.5
2004    80581   411 55  2643    31.5
2005    95058   411 52  NA  NA
2006    113379  411 50  10911   4.65
2007    120894  411 50  10911   4.65
2008    130033  411 50  2483    8.5
2009    139201  411 46  2296    50
2010    148833  411 46  2296    50
2011    158395  411 46  2296    50
1998    16350   431 40  435 57    # *
1998    16353   431 28  303 22.54 # *
2000    37491   436 50  2021    19.11
2001    47019   436 50  2021    19.11
2002    57588   436 51  2021    19.11
2003    69128   436 51  2021    19.11
2004    82400   436 52  2021    19.11
2005    95599   436 50  2021    19.11
2006    113126  436 50  2021    19.11
2007    122387  436 50  2021    19.11
2008    131126  436 50  2021    19.11
2009    140417  436 50  2021    19.11
2010    150673  436 50  2021    19.11
2011    159776  436 50  2021    19.11

Also I need keep those previous and following rows that have just one id per season. Like this: (*=row resulting after being combined)

 season lic id  qtty    vessel  grossTon
1998    15593   411 40  2643    31.5
1999    27271   411 40  2643    31.5
2000    35758   411 40  2643    31.5
2001    45047   411 50  2643    31.5
2002    56291   411 55  2643    31.5
2003    66991   411 55  2643    31.5
2004    80581   411 55  2643    31.5
2005    95058   411 52  NA  NA
2006    113379  411 50  10911   4.65
2007    120894  411 50  10911   4.65
2008    130033  411 50  2483    8.5
2009    139201  411 46  2296    50
2010    148833  411 46  2296    50
2011    158395  411 46  2296    50
1998    16350   431 68  435 79.54 #*
2000    37491   436 50  2021    19.11
2001    47019   436 50  2021    19.11
2002    57588   436 51  2021    19.11
2003    69128   436 51  2021    19.11
2004    82400   436 52  2021    19.11
2005    95599   436 50  2021    19.11
2006    113126  436 50  2021    19.11
2007    122387  436 50  2021    19.11
2008    131126  436 50  2021    19.11
2009    140417  436 50  2021    19.11
2010    150673  436 50  2021    19.11
2011    159776  436 50  2021    19.11
like image 911
Rafael Avatar asked May 14 '13 04:05

Rafael


People also ask

How do I combine multiple rows of data into one row?

To merge two or more rows into one, here's what you need to do: Select the range of cells where you want to merge rows. Go to the Ablebits Data tab > Merge group, click the Merge Cells arrow, and then click Merge Rows into One.


2 Answers

If you turn your data.frame into a data.table you can make great use of the by argument

library(data.table)

DT <- data.table(DF)  # DF is your original data

then it is simply one line:

DT[, lapply(.SD, sum), by=list(season, lic, id, vessel)]

We can filter just the 1998 Season, if we'd like: '

DT[, lapply(.SD, sum), by=list(season, lic, id, vessel)][season==1998]
   season   lic  id vessel qtty grossTon
1:   1998 15593 411   2643   40    31.50
2:   1998 16350 431    435   68   114.00
3:   1998 16353 431    303   68    45.08

The entire result output looks like this:

    season    lic  id vessel qtty grossTon
 1:   1998  15593 411   2643   40    31.50
 2:   1999  27271 411   2643   40    31.50
 3:   2000  35758 411   2643   40    31.50
 4:   2001  45047 411   2643   50    31.50
 5:   2002  56291 411   2643   55    31.50
 6:   2003  66991 411   2643   55    31.50
 7:   2004  80581 411   2643   55    31.50
 8:   2005  95058 411     NA   52       NA
 9:   2006 113379 411  10911   50     4.65
10:   2007 120894 411  10911   50     4.65
11:   2008 130033 411   2483   50     8.50
12:   2009 139201 411   2296   46    50.00
13:   2010 148833 411   2296   46    50.00
14:   2011 158395 411   2296   46    50.00
15:   1998  16350 431    435   68   114.00
16:   1998  16353 431    303   68    45.08
17:   1999  28641 431    303   68    45.08
18:   1999  28644 431    435   68   114.00
19:   2000  37491 436   2021   50    19.11
20:   2001  47019 436   2021   50    19.11
21:   2002  57588 436   2021   51    19.11
22:   2003  69128 436   2021   51    19.11
23:   2004  82400 436   2021   52    19.11
24:   2005  95599 436   2021   50    19.11
25:   2006 113126 436   2021   50    19.11
26:   2007 122387 436   2021   50    19.11
27:   2008 131126 436   2021   50    19.11
28:   2009 140417 436   2021   50    19.11
29:   2010 150673 436   2021   50    19.11
30:   2011 159776 436   2021   50    19.11
    season    lic  id vessel qtty grossTon
like image 110
Ricardo Saporta Avatar answered Sep 18 '22 00:09

Ricardo Saporta


Here's a one-line base solution, following Frank's suggestion to use aggregate:

Df_agg <- aggregate(. ~  season + lic + id + vessel, data = DF, sum)
# DF is your data
# we use season + lic + id + vessel as the grouping elements

Inspect the output:

Df_agg[with(Df_agg, order(lic)), ] 
# check the output (sort for convenience), identical to Ricardo Saporta's output
   season    lic  id vessel qtty grossTon
21   1998  15593 411   2643   40    31.50
3    1998  16350 431    435   68   114.00
1    1998  16353 431    303   68    45.08
22   1999  27271 411   2643   40    31.50
2    1999  28641 431    303   68    45.08
4    1999  28644 431    435   68   114.00
23   2000  35758 411   2643   40    31.50
5    2000  37491 436   2021   50    19.11
24   2001  45047 411   2643   50    31.50
6    2001  47019 436   2021   50    19.11
25   2002  56291 411   2643   55    31.50
7    2002  57588 436   2021   51    19.11
26   2003  66991 411   2643   55    31.50
8    2003  69128 436   2021   51    19.11
27   2004  80581 411   2643   55    31.50
9    2004  82400 436   2021   52    19.11
10   2005  95599 436   2021   50    19.11
11   2006 113126 436   2021   50    19.11
28   2006 113379 411  10911   50     4.65
29   2007 120894 411  10911   50     4.65
12   2007 122387 436   2021   50    19.11
20   2008 130033 411   2483   50     8.50
13   2008 131126 436   2021   50    19.11
17   2009 139201 411   2296   46    50.00
14   2009 140417 436   2021   50    19.11
18   2010 148833 411   2296   46    50.00
15   2010 150673 436   2021   50    19.11
19   2011 158395 411   2296   46    50.00
16   2011 159776 436   2021   50    19.11

Check the year 1998, same as RS, seems that OP has an error in the desired output, 57 + 57 != 79.54 but = 114

Df_agg[Df_agg$season == 1998,]

   season   lic  id vessel qtty grossTon
21   1998 15593 411   2643   40    31.50
3    1998 16350 431    435   68   114.00
1    1998 16353 431    303   68    45.08
like image 40
Ben Avatar answered Sep 20 '22 00:09

Ben