Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Building a continuous relationship curve between cutoff and percentages

Tags:

r

I have raw data where I want to see what kind of cutoff level results in what percentage of observations above the cutoff level. Here is the simulation:

data<-rnorm(100,50,30)
prop.table(table(data>10))
prop.table(table(data>20))
prop.table(table(data>30))
prop.table(table(data>40))
prop.table(table(data>50))
prop.table(table(data>60))
prop.table(table(data>70))
prop.table(table(data>80))
prop.table(table(data>90))

Here is the output:

FALSE  TRUE 
  0.1   0.9 

FALSE  TRUE 
 0.16  0.84 

FALSE  TRUE 
 0.29  0.71 

FALSE  TRUE 
 0.36  0.64 

FALSE  TRUE 
 0.51  0.49 

FALSE  TRUE 
 0.61  0.39 

FALSE  TRUE 
 0.75  0.25 

FALSE  TRUE 
 0.86  0.14 

FALSE  TRUE 
 0.91  0.09 

But it is a crude and inefficient way as you would agree. Instread of calculating respective percentage for each cutoff value endlessly, I wanted to build a plot that represents that relationship where X axis would represent the range of the all possible cutoff levels, and Y axis representing percentages from 0 to 100. Something similar to this:

enter image description here

Please ignore the axis labels etc of the plot, this is only to provide a general example. Any suggestions?

like image 877
Oposum Avatar asked Dec 24 '22 06:12

Oposum


1 Answers

I believe you are looking for the ecdf() function to create an empirical cumulative distribution function.

data<-rnorm(1000,50,30)
a = ecdf(data)
plot(a)

example

like image 170
vincentmajor Avatar answered Dec 26 '22 19:12

vincentmajor