Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

aggregate/sum with ggplot

Tags:

r

ggplot2

Is there a way to sum data with ggplot2 ?

I want to do a bubble map with the size depending of the sum of z.

Currently I'm doing something like

dd <- ddply(d, .(x,y), transform, z=sum(z))
qplot(x,y, data=dd, size=z)

But I feel I'm writing the same thing twice, I would like to be able to write something

qplot(x,y, data=dd, size=sum(z))

I had a look at stat_sum and stat_summmary but I'm not sure they are appropriate either.

Is it possible to it with ggplot2 ? If not, what would be best way to write those 2 lines.

like image 982
mb14 Avatar asked Jun 27 '12 09:06

mb14


2 Answers

It can be done using stat_sum within ggplot2. By default, the dot size represents proportions. To get dot size to represent counts, use size = ..n.. as an aesthetic. Counts (and proportions) by a third variable can be obtained by weighting by the third variable (weight = cost) as an aesthetic. Some examples, but first, some data.

library(ggplot2)
set.seed = 321
# Generate somme data
df <- expand.grid(x = seq(1:5), y = seq(1:5), KEEP.OUT.ATTRS = FALSE)
df$Count = sample(1:25, 25, replace = F)
library(plyr)
new <- dlply(df, .(Count), function(data) matrix(rep(matrix(c(data$x, data$y), ncol = 2), data$Count), byrow = TRUE, ncol = 2))
df2 <- data.frame(do.call(rbind, new))
df2$cost <- 1:325

The data contains units categorised according to two factors: X1 and X2; and a third variable which is the cost of each unit.

Plot 1: Plots the proportion of elements at each X1 - X2 combination. group=1 tells ggplot to calculate proportions out of the total number of units in the data frame.

ggplot(df2, aes(factor(X1), factor(X2))) + 
  stat_sum(aes(group = 1))

enter image description here

Plot 2: Plots the number of elements at each X1 - X2 combination.

ggplot(df2, aes(factor(X1), factor(X2))) + 
  stat_sum(aes(size = ..n..))

enter image description here

Plot 3: Plots the cost of the elements at each X1 - X2 combination, that is weight by the third variable.

ggplot(df2, aes(x=factor(X1), y=factor(X2))) + 
     stat_sum(aes(group = 1, weight = cost, size = ..n..)) 

enter image description here

Plot 4: Plots the proportion of the total cost of all elements in the data frame at each X1 - X2 combination

ggplot(df2, aes(x=factor(X1), y=factor(X2))) + 
     stat_sum(aes(group = 1, weight = cost)) 

enter image description here

Plot 5: Plots proportions, but instead of the proportion being out of the total cost across all elements in the data frame, the proportion is out of the cost for elements within each category of X1. That is, within each X1 category, where does the major cost for X2 units occur?

ggplot(df2, aes(x=factor(X1), y=factor(X2))) + 
     stat_sum(aes(group = X1, weight = cost)) 

enter image description here

like image 97
Sandy Muspratt Avatar answered Nov 04 '22 13:11

Sandy Muspratt


You could put the ddply call into the qplot:

d <- data.frame(x=1:10, y=1:10, z= runif(100))
qplot(x, y, data=ddply(d, .(x,y), transform, z=sum(z)), size=z)

Or use the data.table package.

DT <- data.table(d, key='x,y')
qplot(x, y, data=DT[, sum(z), by='x,y'], size=V1)
like image 2
user1486971 Avatar answered Nov 04 '22 13:11

user1486971