Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Histogram of sum instead of frequency - R

Tags:

r

histogram

I want to plot an histogram where the y-axis represent the sum of a column. I found this example for categorical data: R histogram that sums rather than frequency. However, this is not what I am looking for, as it does not apply for continuous data, where I would have to define the bins.

Let's say I have x and y:

set.seed(1)
mydata <- data.frame(y = runif (100, min= 0, max = 1),
                 x = rpois(100, 15) * 10)

A traditional histogram will be like:

hist (mydata$x)

enter image description here

Now how can I get the cumulative sum of y in the y-axis?

like image 647
AEM Avatar asked May 06 '15 16:05

AEM


2 Answers

This is one way to solve this problem that leverages the hist() function for most of the heavy lifting, and has the advantage that the barplot of the cumulative sum of y matches the bins and dimensions of the histogram of x:

set.seed(1)
mydata <- data.frame(y = runif (100, min= 0, max = 1), x = rpois(100, 15) * 10)
mx <- mydata$x
my <- mydata$y

h <- hist(mydata$x)

breaks <- data.frame(
    "beg"=h$breaks[-length(h$breaks)], 
    "end"=h$breaks[-1]
)

sums <- apply(breaks, MARGIN=1, FUN=function(x) { sum(my[ mx >= x[1] & mx < x[2] ]) })

h$counts <- sums
plot(h, ylab="Sum", main="Sum of y Within x Bins")

enter image description here

like image 79
Forrest R. Stevens Avatar answered Oct 03 '22 23:10

Forrest R. Stevens


Summarizing all comments, this is what I wanted to have. Thanks @Alex A.

set.seed(1)

mydata <- data.frame(y = runif (100, min= 0, max = 1), x = rpois(100, 15) * 10)

a <- aggregate(mydata$y, by=list(bin=cut(mydata$x, nclass.Sturges(mydata$x))), FUN=sum)
a$bin<- gsub (']','',as.character (a$bin))
a$bin<- gsub (',',' ',as.character (a$bin))

ab2=sapply(strsplit(as.character(a$bin), " "), "[", 2)
barplot(a$x, names.arg=ab2)

enter image description here

like image 24
AEM Avatar answered Oct 04 '22 01:10

AEM