Let's say I have a response variable that rises and falls over time. Each time the response variable rises above a threshold, we have a new "Trial." That is, if I add a column Threshold
that is TRUE
whenever above a certain value, consecutive blocks of data points where Threshold
is TRUE
constitute a new trial.
Time <- seq(1, 10, by = 0.5)
Response <- abs(sin(Time))
Threshold <- Response > 0.6
data <- data.frame(Time, Response, Threshold)
Given Time
, Response
, and Threshold
, how could I go about adding a Trial
factor that has a new value for each group of TRUE
thresholds? Something like this:
Time Response Threshold Trial
1 1.0 0.84147098 TRUE A
2 1.5 0.99749499 TRUE A
3 2.0 0.90929743 TRUE A
4 2.5 0.59847214 FALSE NA
5 3.0 0.14112001 FALSE NA
6 3.5 0.35078323 FALSE NA
7 4.0 0.75680250 TRUE B
8 4.5 0.97753012 TRUE B
9 5.0 0.95892427 TRUE B
10 5.5 0.70554033 TRUE B
11 6.0 0.27941550 FALSE NA
12 6.5 0.21511999 FALSE NA
13 7.0 0.65698660 TRUE C
14 7.5 0.93799998 TRUE C
15 8.0 0.98935825 TRUE C
16 8.5 0.79848711 TRUE C
17 9.0 0.41211849 FALSE NA
18 9.5 0.07515112 FALSE NA
19 10.0 0.54402111 FALSE NA
Binning or discretization is used for the transformation of a continuous or numerical variable into a categorical feature. Binning of continuous variable introduces non-linearity and tends to improve the performance of the model. It can be also used to identify missing values or outliers.
Binning, also called discretization, is a technique for reducing the cardinality of continuous and discrete data. Binning groups related values together in bins to reduce the number of distinct values.
Binning is a way to group a number of more or less continuous values into a smaller number of "bins". For example, if you have data about a group of people, you might want to arrange their ages into a smaller number of age intervals.
data$Trial <- factor(
ifelse(data$Threshold, cumsum(!data$Threshold), NA), labels = c("A", "B", "C")
)
## Time Response Threshold Trial
## 1 1.0 0.84147098 TRUE A
## 2 1.5 0.99749499 TRUE A
## 3 2.0 0.90929743 TRUE A
## 4 2.5 0.59847214 FALSE <NA>
## 5 3.0 0.14112001 FALSE <NA>
## 6 3.5 0.35078323 FALSE <NA>
## 7 4.0 0.75680250 TRUE B
## 8 4.5 0.97753012 TRUE B
## 9 5.0 0.95892427 TRUE B
## 10 5.5 0.70554033 TRUE B
## 11 6.0 0.27941550 FALSE <NA>
## 12 6.5 0.21511999 FALSE <NA>
## 13 7.0 0.65698660 TRUE C
## 14 7.5 0.93799998 TRUE C
## 15 8.0 0.98935825 TRUE C
## 16 8.5 0.79848711 TRUE C
## 17 9.0 0.41211849 FALSE <NA>
## 18 9.5 0.07515112 FALSE <NA>
## 19 10.0 0.54402111 FALSE <NA>
Another possibility using rle
:
r <- with(data, rle(Threshold))
len <- with(r, lengths[values])
n <- length(len)
trial <- rep(x = LETTERS[1:n], times = len)
data$Trial[data$Threshold] <- trial
data
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With