Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to calculate the percentage of data points belonging to a range of values?

Tags:

r

Given a table of values (say between 0 to 100) and the attached plot, what would be the simplest way using R to calculate how many of the data points fall between values 20 - 60 (the red box in the image)?

And is there a way to create that red box using R's plotting functions (I did it using a image editor...)?

Thanks for the help. enter image description here

like image 695
user971956 Avatar asked Nov 30 '22 14:11

user971956


1 Answers

To calculate the probability mass contained within the interval:

x <- rnorm(1e6)  ## data forming your empirical distribution
ll <- -1.96      ## lower bound of interval of interest
ul <- 1.96       ## upper bound of interval of interest

sum(x > ll & x < ul)/length(x)
# [1] 0.949735

And then to plot the histogram and the red box:

h <- hist(x, breaks=100, plot=FALSE)       # Calculate but don't plot histogram
maxct <- max(h$counts)                     # Extract height of the tallest bar
## Or, if you want the height of the tallest bar within the interval
# start <- findInterval(ll, h$breaks)
# end   <- findInterval(ul, h$breaks)
# maxct <- max(h$counts[start:end])

plot(h, ylim=c(0, 1.05*maxct), col="blue") # Plot, leaving a bit of space up top

rect(xleft = ll, ybottom = -0.02*maxct,    # Add box extending a bit above
     xright = ul, ytop = 1.02*maxct,       # and a bit below the bars
     border = "red", lwd = 2)

enter image description here

like image 73
Josh O'Brien Avatar answered Dec 15 '22 12:12

Josh O'Brien