I have raw data where I want to see what kind of cutoff level results in what percentage of observations above the cutoff level. Here is the simulation:
data<-rnorm(100,50,30)
prop.table(table(data>10))
prop.table(table(data>20))
prop.table(table(data>30))
prop.table(table(data>40))
prop.table(table(data>50))
prop.table(table(data>60))
prop.table(table(data>70))
prop.table(table(data>80))
prop.table(table(data>90))
Here is the output:
FALSE TRUE
0.1 0.9
FALSE TRUE
0.16 0.84
FALSE TRUE
0.29 0.71
FALSE TRUE
0.36 0.64
FALSE TRUE
0.51 0.49
FALSE TRUE
0.61 0.39
FALSE TRUE
0.75 0.25
FALSE TRUE
0.86 0.14
FALSE TRUE
0.91 0.09
But it is a crude and inefficient way as you would agree. Instread of calculating respective percentage for each cutoff value endlessly, I wanted to build a plot that represents that relationship where X axis would represent the range of the all possible cutoff levels, and Y axis representing percentages from 0 to 100. Something similar to this:
Please ignore the axis labels etc of the plot, this is only to provide a general example. Any suggestions?
I believe you are looking for the ecdf()
function to create an empirical cumulative distribution function.
data<-rnorm(1000,50,30)
a = ecdf(data)
plot(a)
example
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With