Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Create ROC curve manually from data frame

I have the below conceptual problem which I can't get my head around.

Below is an example for survey data where I have a time column that indicates how long someone needs to respond to a certain question.

Now, I'm interested in how the amount of cleaning would change based on this threshold, i.e. what would happen if I increase the threshold, what would happen if I decrease it.

So my idea was to just create a ROC curve (or other model metrics) to have a visual cue about a potential threshold. The problem is that I don't have a machine-learning-like model that would give me class probabilities. So I was wondering if there's any way to create a ROC curve nonetheless with this type of data. I had the idea of just looping through my data at maybe 100 different thresholds, calculate false and true positive rates at each threshold and then do a simple line plot, but I was hoping for a more elegant solution that doesn't require me to loop.

Any ideas?

example data:

  • time column indidates the time needed per case
  • truth column indicates my current decision I want to compare against
  • predicted column indicates the cleaning decision if I would cut at a time threshold of 2.5s. This is waht I need to change/loop through.

set.seed(3)
df <- data.frame(time      = c(2.5 + rnorm(5), 3.5 + rnorm(5)),
                 truth     = rep(c("cleaned", "final"), each = 5)) %>%
  mutate(predicted = if_else(time < 2.5, "cleaned", "final"))
like image 946
deschen Avatar asked May 25 '26 23:05

deschen


1 Answers

You can use ROCR too for this

library(ROCR)

set.seed(3)
df <- data.frame(time      = c(2.5 + rnorm(5), 3.5 + rnorm(5)),
                 truth     = rep(c("cleaned", "final"), each = 5)) %>%
  mutate(predicted = if_else(time < 2.5, "cleaned", "final"))

pred <- prediction(df$time, df$truth)
perf <- performance(pred,"tpr","fpr")
plot(perf,colorize=TRUE)

ROC Curve

You can also check the AUC value:

auc <- performance(pred, measure = "auc")
[email protected][[1]]

[1] 0.92

Cross checking the AUC value with pROC

library(pROC)

roc(df$truth, df$time)

Call:
roc.default(response = df$truth, predictor = df$time)

Data: df$time in 5 controls (df$truth cleaned) < 5 cases (df$truth final).
Area under the curve: 0.92

For both the cases, it is same!

like image 80
Shibaprasadb Avatar answered May 28 '26 11:05

Shibaprasadb