R sort summarise ddply by group sum

Question

I have a data.frame like this

x <- data.frame(Category=factor(c("One", "One", "Four", "Two","Two",
"Three", "Two", "Four","Three")),
City=factor(c("D","A","B","B","A","D","A","C","C")),
Frequency=c(10,1,5,2,14,8,20,3,5))

  Category City Frequency
1      One    D        10
2      One    A         1
3     Four    B         5
4      Two    B         2
5      Two    A        14
6    Three    D         8
7      Two    A        20
8     Four    C         3
9    Three    C         5

I want to make a pivot table with sum(Frequency) and used the ddply function like this:

ddply(x,.(Category,City),summarize,Total=sum(Frequency))
  Category City Total
1     Four    B     5
2     Four    C     3
3      One    A     1
4      One    D    10
5    Three    C     5
6    Three    D     8
7      Two    A    34
8      Two    B     2

But I need this results sorted by the total in each Category group. Something like this:

Category City Frequency
1      Two    A        34
2      Two    B         2
3    Three    D        14
4    Three    C         5
5      One    D        10
6      One    A         1
7     Four    B         5
8     Four    C         3

I have looked and tried sort, order, arrange, but nothing seems to do what I need. How can I do this in R?

BrodieG · Accepted Answer

Here is a base R version, where DF is the result of your ddply call:

with(DF, DF[order(-ave(Total, Category, FUN=sum), Category, -Total), ])

produces:

  Category City Total
7      Two    A    34
8      Two    B     2
6    Three    D     8
5    Three    C     5
4      One    D    10
3      One    A     1
1     Four    B     5
2     Four    C     3

The logic is basically the same as David's, calculate the sum of Total for each Category, use that number for all rows in each Category (we do this with ave(..., FUN=sum)), and then sort by that plus some tie breakers to make sure stuff comes out as expected.

David Arenburg · Answer

This is a nice question and I can't think of a straight way of doing this rather than creating a total size index and then sorting by it. Here's a possible data.table approach which uses setorder function which will order your data by reference

library(data.table)
Res <- setDT(x)[, .(Total = sum(Frequency)), by = .(Category, City)]
setorder(Res[, size := sum(Total), by = Category], -size, -Total, Category)[]
#    Category City Total size
# 1:      Two    A    34   36
# 2:      Two    B     2   36
# 3:    Three    D     8   13
# 4:    Three    C     5   13
# 5:      One    D    10   11
# 6:      One    A     1   11
# 7:     Four    B     5    8
# 8:     Four    C     3    8

Or if you deep in the Hdleyverse, we can reach a similar result using the newer dplyr package (as suggested by @akrun)

library(dplyr)
x %>% 
  group_by(Category, City) %>% 
  summarise(Total = sum(Frequency)) %>% 
  mutate(size= sum(Total)) %>% 
  ungroup %>%
  arrange(-size, -Total, Category)

R sort summarise ddply by group sum

Tags:

r

pivot-table

plyr

Liliana Pacheco

2 Answers

BrodieG

David Arenburg

Recent Activity

Donate For Us

R sort summarise ddply by group sum

Tags:

r

pivot-table

plyr

Liliana Pacheco

2 Answers

BrodieG

David Arenburg

Related questions

Recent Activity

Donate For Us