I'm a relative R noob.
I've got a big dataset that looks something like this:
Tempadjvolt newmass rgdeltas
2794 498.5777 0.5355647187 0.00000000
2795 499.7577 0.5355647187 0.00000000
2796 500.7877 0.3415104788 -2.87487763
2797 502.1177 0.4312854788 -1.54487763
2798 500.3877 0.5355647187 0.00000000
2799 502.5377 0.4596354788 -1.12487763
2800 507.6877 0.8072604788 4.02512237
2801 505.2577 0.6432354788 1.59512237
2802 505.7977 0.6796854788 2.13512237
2803 517.8877 1.4957604788 14.22512237
2804 502.2477 0.4400604788 -1.41487763
2805 507.3677 0.7856604788 3.70512237
2806 519.7277 1.6199604788 16.06512237
2807 528.9377 2.2416354788 25.27512237
2808 520.2677 1.6564104788 16.60512237
2809 519.3877 0.5355647187 0.00000000
2810 526.5677 2.0816604788 22.90512237
2811 519.5377 0.5355647187 0.00000000
2812 526.9277 2.1059604788 23.26512237
2813 529.9877 2.3125104788 26.32512237
2814 514.4077 1.2608604788 10.74512237
2815 518.3777 1.5288354788 14.71512237
I'm trying to identify negative rgdeltas values [for example, row 2804] and then 'look' 7 positions behind and ahead to find the highest Tempadjvolt and set row 2804's tempadjvolt to that local max.
The frame is ~4000 rows long, of which ~515 are negative values. I tried a couple for loops that sorta worked... but also spit out a bunch of NAs -- which makes me think they were poorly/improperly constructed.
Any assistance would be greatly appreciated.
As was pointed out in the comments, the original post was unclear. I'm not concerned about consecutive negative rgdeltas values. For negative values within 7 of the front and end of the frame, ideally the loop would look as many positions forward and back before the beginning/end. Less concerned with that at this point.
A little more background: This is part of a signal processing program originally written in C# that I'm attempting to move to R to augment some more facile batch processing of a large number of files output from an environmental monitor. I didn't write the original code and this is only one small component of a much larger set of stuff going on.
I appreciate the help. Thanks!
1) Zero Fill. Assuming that the data frame is called DF
we use rollapply
in the zoo package to apply function, f
, to a moving window of width 15
:
library(zoo)
# columns of DF are (1) Tempadjvolt, (2) newmass and (3) rgdeltas
f <- function(x) if (x[8, 3] < 0) max(x[, 1]) else x[8, 1]
DF[[1]] <- rollapply(DF, 15, f, fill = 0, by.column = FALSE)
In the above we filled the points near the beginning and end with zeros since it seems the precise way of dealing with this is not so important but we could have used some other fill value.
2) Leave end values. Another possibility is to only process the points not near the ends:
DF[seq(8, nrow(DF)-7), 1] <- rollapply(DF, 15, f, by.column = FALSE)
3) Partials. or we could have used partial = TRUE
and then take the max
of the partial values near the ends like this:
f2 <- function(x) {
# Columns of DF2 are (1) Tempadjvolt, (2) newmass, (3) rgdeltas and (4) seq.
# Condition is TRUE if passed a partial x near the beginning.
# k is row index of current row in x. Normally 8 but near start it varies.
k <- if (x[1, 4] == 1) nrow(x) - 7 else 8
if (x[k, 3] < 0) max(x[, 1]) else x[k, 1]
}
DF2 <- cbind(DF, seq = 1:nrow(DF))
DF[[1]] <- rollapply(DF2, 15, f2, partial = TRUE, by.column = FALSE)
Assume its name is dat
:
negidxs <- as.numeric( rownames(dat)[ dat[[3]] < 0 ] )
for ( i in negidxs ){
dat[as.character(i), "Tempadjvolt"] <-
max(dat[rownames(dat) %in% (i-7):(i+7), "Tempadjvolt"], na.rm=TRUE) }
dat
#----------------------------------#
Tempadjvolt newmass rgdeltas
2794 498.5777 0.5355647 0.000000
2795 499.7577 0.5355647 0.000000
2796 517.8877 0.3415105 -2.874878
2797 517.8877 0.4312855 -1.544878
2798 500.3877 0.5355647 0.000000
2799 519.7277 0.4596355 -1.124878
2800 507.6877 0.8072605 4.025122
2801 505.2577 0.6432355 1.595122
2802 505.7977 0.6796855 2.135122
#snipped-----
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With