Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Summing all columns by group [duplicate]

Tags:

r

aggregate

I'm positive that this is an incredibly easy answer but I can't seem to get my head around aggregating or casting with Multiple conditions

I have a table that looks like this:

> head(df, n=10L)
   STATE  EVTYPE FATALITIES INJURIES
1     AL TORNADO          0       15
3     AL TORNADO          0        2
4     AL TORNADO          0        2
5     AL TORNADO          0        2
6     AL TORNADO          0        6
7     AL TORNADO          0        1
9     AL TORNADO          1       14
11    AL TORNADO          0        3
12    AL TORNADO          0        3
13    AL TORNADO          1       26

Obviously this goes on... What I want to do is to collapse by STATE and EVTYPE Summing Fatalities and Injuries as I go so if these 10 rows were my ful dataset the result would be a single row data frame of:

   STATE  EVTYPE FATALITIES INJURIES
1     AL TORNADO          2       74

My Complete frame has many States and many EVTYPES

like image 442
NoobMat Avatar asked Jan 22 '15 13:01

NoobMat


2 Answers

You can try

library(dplyr)
df %>% 
    group_by(STATE, EVTYPE) %>% 
    summarise_each(list(sum))

Or

aggregate(.~STATE+EVTYPE, df, sum)
like image 96
akrun Avatar answered Nov 10 '22 12:11

akrun


Try ddply, e.g. example below sums explicitly typed columns, but I'm almost sure there can be used a wildcard or a trick to sum all columns. Grouping is made by "STATE".

library(plyr)
df <- read.table(text = "STATE  EVTYPE FATALITIES INJURIES
1     AL TORNADO          0       15
3     AL TORNADO          0        2
4     AL TORNADO          0        2
5     AL TORNADO          0        2
6     AL TORNADO          0        6
7     AL TORNADO          0        1
9     AL TORNADO          1       14
11    AL TORNADO          0        3
12    AL TORNADO          0        3
13    AL TORNADO          1       26
14    IL FLOOD            0       15
15    IL FLOOD            0       20
16    IL FIRE             1        1", header = TRUE, sep = "")

c = ddply(df,.(STATE),summarise,val1 = sum(FATALITIES), val = sum(INJURIES))
print(c)

Result:

  STATE val1 val
1    AL    2  74
2    IL    1  36
like image 22
Alexey Ferapontov Avatar answered Nov 10 '22 12:11

Alexey Ferapontov