Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Summarize different Columns with different Functions

I have the following Problem: In a data frame I have a lot of rows and columns with the first row being the date. For each date I have more than 1 observation and I want to summarize them.

My df looks like that (date replaced by ID for ease of use):

df:
ID     Cash    Price    Weight   ...
1      0.4     0        0
1      0.2     0        82       ...
1      0       1        0        ...
1      0       3.2      80       ...
2      0.3     1        70       ...
...    ...     ...      ...      ...

I want to group them by the first column and then summarize all rows BUT with different functions:

The function Cash and Price should be sum so I get the sum of Cash and Price for each ID. The function on Weight should be max so I only get the maximum weight for the ID.

Because I have so many columns I can not write a all functions by hand, but I have only 2 columns which should be summarized by max the rest should be summarized by sum.

So I am looking for a function to group by ID, summarize all with sum except 2 different columns which I need the max value.

I tried to use the dplyr package with:

df %>% group_by(ID = tolower(ID)) %>% summarise_each(funs(sum))

But I need the addition to not sum but max the 2 specified columns, any Ideas?

To be clear, the output of the example df should be:

ID     Cash     Price    Weight
1       0.6        4.2       82     
2       0.3        1          70
like image 967
Max Avatar asked Apr 24 '16 11:04

Max


1 Answers

We can use

 df %>%
    group_by(ID) %>%
    summarise(Cash = sum(Cash), Price = sum(Price), Weight = max(Weight))

If we have many columns, one way would be to do this separately and then join the output together.

 df1 <- df %>% 
          group_by(ID) %>% 
          summarise_each(funs(sum), Cash:Price)
 df2 <- df %>%
          group_by(ID) %>% 
          summarise_each(funs(max), Weight)
 inner_join(df1, df2, by = "ID")
 #      ID  Cash Price Weight
 #   (int) (dbl) (dbl)  (int)
 #1     1   0.6   4.2     82
 #2     2   0.3   1.0     70
like image 121
akrun Avatar answered Nov 26 '22 10:11

akrun