Is there a way to sum data with ggplot2
?
I want to do a bubble map with the size depending of the sum of z.
Currently I'm doing something like
dd <- ddply(d, .(x,y), transform, z=sum(z))
qplot(x,y, data=dd, size=z)
But I feel I'm writing the same thing twice, I would like to be able to write something
qplot(x,y, data=dd, size=sum(z))
I had a look at stat_sum
and stat_summmary
but I'm not sure they are appropriate either.
Is it possible to it with ggplot2
? If not, what would be best way to write those 2 lines.
It can be done using stat_sum
within ggplot2. By default, the dot size represents proportions. To get dot size to represent counts, use size = ..n..
as an aesthetic. Counts (and proportions) by a third variable can be obtained by weighting by the third variable (weight = cost
) as an aesthetic. Some examples, but first, some data.
library(ggplot2)
set.seed = 321
# Generate somme data
df <- expand.grid(x = seq(1:5), y = seq(1:5), KEEP.OUT.ATTRS = FALSE)
df$Count = sample(1:25, 25, replace = F)
library(plyr)
new <- dlply(df, .(Count), function(data) matrix(rep(matrix(c(data$x, data$y), ncol = 2), data$Count), byrow = TRUE, ncol = 2))
df2 <- data.frame(do.call(rbind, new))
df2$cost <- 1:325
The data contains units categorised according to two factors: X1 and X2; and a third variable which is the cost of each unit.
Plot 1: Plots the proportion of elements at each X1 - X2 combination. group=1
tells ggplot to calculate proportions out of the total number of units in the data frame.
ggplot(df2, aes(factor(X1), factor(X2))) +
stat_sum(aes(group = 1))
Plot 2: Plots the number of elements at each X1 - X2 combination.
ggplot(df2, aes(factor(X1), factor(X2))) +
stat_sum(aes(size = ..n..))
Plot 3: Plots the cost of the elements at each X1 - X2 combination, that is weight
by the third variable.
ggplot(df2, aes(x=factor(X1), y=factor(X2))) +
stat_sum(aes(group = 1, weight = cost, size = ..n..))
Plot 4: Plots the proportion of the total cost of all elements in the data frame at each X1 - X2 combination
ggplot(df2, aes(x=factor(X1), y=factor(X2))) +
stat_sum(aes(group = 1, weight = cost))
Plot 5: Plots proportions, but instead of the proportion being out of the total cost across all elements in the data frame, the proportion is out of the cost for elements within each category of X1. That is, within each X1 category, where does the major cost for X2 units occur?
ggplot(df2, aes(x=factor(X1), y=factor(X2))) +
stat_sum(aes(group = X1, weight = cost))
You could put the ddply
call into the qplot
:
d <- data.frame(x=1:10, y=1:10, z= runif(100))
qplot(x, y, data=ddply(d, .(x,y), transform, z=sum(z)), size=z)
Or use the data.table
package.
DT <- data.table(d, key='x,y')
qplot(x, y, data=DT[, sum(z), by='x,y'], size=V1)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With