I'm quite new to all the packages meant for calculating rolling averages in R and I hope you can show me in the right direction.
I have the following data as an example:
ms <- c(300, 300, 300, 301, 303, 305, 305, 306, 308, 310, 310, 311, 312,
314, 315, 315, 316, 316, 316, 317, 318, 320, 320, 321, 322, 324,
328, 329, 330, 330, 330, 332, 332, 334, 334, 335, 335, 336, 336,
337, 338, 338, 338, 340, 340, 341, 342, 342, 342, 342)
correct <- c(1, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 0, 0,
1, 1, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 0, 1,
1, 0, 0, 1, 0, 0, 1, 1, 0, 0)
df <- data.frame(ms, correct)
ms
are time points in milliseconds and correct
is whether a specific action is performed correctly
(1 = correct, 0 = not correct).
My goal now is that I'd like to calculate the percentage correct (or average) over windows of a set number of milliseconds. As you can see, certain time points are missing and certain time points occur multiple times. I, therefore, do not want to do a filter based on row number. I've looked into some packages such as "tidyquant" but it seems to me that these kind of packages need a time/date variable instead of a numerical variable to determine the window over which values are averaged. Is there a way to specify the window on the numerical value of df$ms
?
For the sake of completeness, here is an answer which uses data.table to aggregate in a non-equi join.
The OP has clarified in comments, that he is looking for a sliding window of 5 ms, i.e., windows that go 300-304, 301-305, 302-306 etc.
As there is no data point with 302 ms in OP's data set, the missing values need to be filled up.
library(data.table)
ws <- 5 # define window size
setDT(df)[SJ(start = seq(min(ms), max(ms), 1))[, end := start + ws - 1],
on = .(ms >= start, ms <= end),
.(share_correct = mean(correct)), by = .EACHI]
ms ms share_correct 1: 300 304 0.4000000 2: 301 305 0.0000000 3: 302 306 0.2500000 4: 303 307 0.2500000 5: 304 308 0.2500000 6: 305 309 0.2500000 7: 306 310 0.2500000 8: 307 311 0.0000000 9: 308 312 0.2000000 10: 309 313 0.2500000 11: 310 314 0.2000000 12: 311 315 0.4000000 13: 312 316 0.4285714 14: 313 317 0.2857143 15: 314 318 0.3750000 16: 315 319 0.4285714 17: 316 320 0.4285714 18: 317 321 0.4000000 19: 318 322 0.4000000 20: 319 323 0.2500000 21: 320 324 0.4000000 22: 321 325 0.3333333 23: 322 326 0.5000000 24: 323 327 1.0000000 25: 324 328 1.0000000 26: 325 329 0.5000000 27: 326 330 0.2000000 28: 327 331 0.2000000 29: 328 332 0.4285714 30: 329 333 0.3333333 31: 330 334 0.2857143 32: 331 335 0.5000000 33: 332 336 0.3750000 34: 333 337 0.2857143 35: 334 338 0.3000000 36: 335 339 0.3750000 37: 336 340 0.3750000 38: 337 341 0.4285714 39: 338 342 0.4000000 40: 339 343 0.4285714 41: 340 344 0.4285714 42: 341 345 0.4000000 43: 342 346 0.5000000 ms ms share_correct
If the OP would be interested only in windows where the starting point exist in the dataset the code can be simplified:
setDT(df)[SJ(start = unique(ms))[, end := start + ws - 1],
on = .(ms >= start, ms <= end),
.(share_correct = mean(correct)), by = .EACHI]
ms ms share_correct 1: 300 304 0.4000000 2: 301 305 0.0000000 3: 303 307 0.2500000 4: 305 309 0.2500000 5: 306 310 0.2500000 6: 308 312 0.2000000 7: 310 314 0.2000000 8: 311 315 0.4000000 9: 312 316 0.4285714 10: 314 318 0.3750000 11: 315 319 0.4285714 12: 316 320 0.4285714 13: 317 321 0.4000000 14: 318 322 0.4000000 15: 320 324 0.4000000 16: 321 325 0.3333333 17: 322 326 0.5000000 18: 324 328 1.0000000 19: 328 332 0.4285714 20: 329 333 0.3333333 21: 330 334 0.2857143 22: 332 336 0.3750000 23: 334 338 0.3000000 24: 335 339 0.3750000 25: 336 340 0.3750000 26: 337 341 0.4285714 27: 338 342 0.4000000 28: 340 344 0.4285714 29: 341 345 0.4000000 30: 342 346 0.5000000 ms ms share_correct
In both cases, a data.table containing the intervals [start, end]
is created on the fly and right joined to df
. During the non-equi join, the intermediate result is immediately grouped by the join parameters (by = .EACHI
) and aggregated. Note that closed intervals are used to be in line with OP's expectations.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With