I want to plot an histogram where the y-axis represent the sum of a column. I found this example for categorical data: R histogram that sums rather than frequency. However, this is not what I am looking for, as it does not apply for continuous data, where I would have to define the bins.
Let's say I have x and y:
set.seed(1)
mydata <- data.frame(y = runif (100, min= 0, max = 1),
x = rpois(100, 15) * 10)
A traditional histogram will be like:
hist (mydata$x)
Now how can I get the cumulative sum of y in the y-axis?
This is one way to solve this problem that leverages the hist() function for most of the heavy lifting, and has the advantage that the barplot of the cumulative sum of y matches the bins and dimensions of the histogram of x:
set.seed(1)
mydata <- data.frame(y = runif (100, min= 0, max = 1), x = rpois(100, 15) * 10)
mx <- mydata$x
my <- mydata$y
h <- hist(mydata$x)
breaks <- data.frame(
"beg"=h$breaks[-length(h$breaks)],
"end"=h$breaks[-1]
)
sums <- apply(breaks, MARGIN=1, FUN=function(x) { sum(my[ mx >= x[1] & mx < x[2] ]) })
h$counts <- sums
plot(h, ylab="Sum", main="Sum of y Within x Bins")
Summarizing all comments, this is what I wanted to have. Thanks @Alex A.
set.seed(1)
mydata <- data.frame(y = runif (100, min= 0, max = 1), x = rpois(100, 15) * 10)
a <- aggregate(mydata$y, by=list(bin=cut(mydata$x, nclass.Sturges(mydata$x))), FUN=sum)
a$bin<- gsub (']','',as.character (a$bin))
a$bin<- gsub (',',' ',as.character (a$bin))
ab2=sapply(strsplit(as.character(a$bin), " "), "[", 2)
barplot(a$x, names.arg=ab2)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With