Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

If there are a certain number of consecutive NAs in a column, then replace the values

Tags:

r

na

I have a tibble with a column called meanSR_strong and another called meanSR_weak. If there are 10 or more consecutive NAs in the meanSR_strong column, I would like to replace the values with values from the meanSR_weak column, even if those replaced values are also NA. If there are under consecutive NAs in the meanSR_strong column, then I don't need to do any replacing.

For example, rows 3-6 are all NA, but that is only four consecutive, so it doesn't matter. However rows 15-28 are all NA (and that is more than 10 in a row), so I want to sub in values from the meanSR_weak column.

I know how to replace all the NAs, but I haven't figured out a nice way of coding this!

Here is my data

x=structure(list(meanSR_strong = c(NA, 0.376009009009009, NA, NA, 
NA, NA, 0.615585585585586, NA, 0.607354054054054, 0.590210810810811, 
0.57005045045045, 0.596616216216216, 0.584066666666667, 0.538597297297297, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 0.639010810810811, 
0.634272972972973), meanSR_weak = c(0.574724324324324, 0.562030630630631, 
0.586247747747748, NA, NA, NA, 0.615585585585586, NA, 0.607354054054054, 
0.590210810810811, 0.57005045045045, 0.596616216216216, 0.608510810810811, 
0.538597297297297, NA, NA, NA, 0.555463063063063, 0.376715315315315, 
NA, NA, NA, NA, NA, NA, 0.60972972972973, NA, NA, 0.639010810810811, 
0.634272972972973), cloud.pct_strong = c(100, 36.036036036036, 
98.1981981981982, 100, 100, 100, 0, 100, 0, 0, 0, 0, 3.6036036036036, 
0, NA, NA, 100, 67.5675675675676, 100, 100, NA, 100, 100, 100, 
100, 74.7747747747748, 100, 100, 0, 0), cloud.pct_weak = c(0, 
0, 0, 100, 100, 100, 0, 100, 0, 0, 0, 0, 0, 0, NA, NA, 100, 0, 
36.036036036036, 67.5675675675676, NA, 100, 100, 100, 100, 0.900900900900901, 
100, 60.3603603603604, 0, 0), date = structure(c(951868800, 951955200, 
952041600, 952128000, 952214400, 952300800, 952387200, 952473600, 
952560000, 952646400, 952732800, 952819200, 952905600, 952992000, 
953078400, 953164800, 953251200, 953337600, 953424000, 953510400, 
953596800, 953683200, 953769600, 953856000, 953942400, 954028800, 
954115200, 954201600, 954288000, 954374400), class = c("POSIXct", 
"POSIXt"), tzone = "UTC")), .Names = c("meanSR_strong", "meanSR_weak", 
"cloud.pct_strong", "cloud.pct_weak", "date"), row.names = c(NA, 
-30L), class = c("tbl_df", "tbl", "data.frame"))
like image 211
Ana Avatar asked Jan 28 '23 17:01

Ana


1 Answers

The R rle function can be used for this. First build an rle-list ("values" and "lengths", see ?rle) of the is.na-values:

z <- rle(is.na(x$meanSR_strong))

Then change the z$values entries from TRUE to FALSE when the run of NA's is less than some length that you choose. Here I choose 10:

z$values[z$lengths <10& z$values==TRUE] <- FALSE

Then reconstruct a logical vector for indexing with the [<- function using the rep-function which is essentially an inverse of rle:

x [ rep( z$values, z$lengths), "meanSR_strong"] <- 
                                   x[ rep( z$values, z$lengths), "meanSR_weak"]

print(x, n=30)
# A tibble: 30 x 5
   meanSR_strong meanSR_weak cloud.pct_strong cloud.pct_weak       date
           <dbl>       <dbl>            <dbl>          <dbl>     <dttm>
 1            NA   0.5747243       100.000000      0.0000000 2000-03-01
 2     0.3760090   0.5620306        36.036036      0.0000000 2000-03-02
 3            NA   0.5862477        98.198198      0.0000000 2000-03-03
 4            NA          NA       100.000000    100.0000000 2000-03-04
 5            NA          NA       100.000000    100.0000000 2000-03-05
 6            NA          NA       100.000000    100.0000000 2000-03-06
 7     0.6155856   0.6155856         0.000000      0.0000000 2000-03-07
 8            NA          NA       100.000000    100.0000000 2000-03-08
 9     0.6073541   0.6073541         0.000000      0.0000000 2000-03-09
10     0.5902108   0.5902108         0.000000      0.0000000 2000-03-10
11     0.5700505   0.5700505         0.000000      0.0000000 2000-03-11
12     0.5966162   0.5966162         0.000000      0.0000000 2000-03-12
13     0.5840667   0.6085108         3.603604      0.0000000 2000-03-13
14     0.5385973   0.5385973         0.000000      0.0000000 2000-03-14
15            NA          NA               NA             NA 2000-03-15
16            NA          NA               NA             NA 2000-03-16
17            NA          NA       100.000000    100.0000000 2000-03-17
18     0.5554631   0.5554631        67.567568      0.0000000 2000-03-18
19     0.3767153   0.3767153       100.000000     36.0360360 2000-03-19
20            NA          NA       100.000000     67.5675676 2000-03-20
21            NA          NA               NA             NA 2000-03-21
22            NA          NA       100.000000    100.0000000 2000-03-22
23            NA          NA       100.000000    100.0000000 2000-03-23
24            NA          NA       100.000000    100.0000000 2000-03-24
25            NA          NA       100.000000    100.0000000 2000-03-25
26     0.6097297   0.6097297        74.774775      0.9009009 2000-03-26
27            NA          NA       100.000000    100.0000000 2000-03-27
28            NA          NA       100.000000     60.3603604 2000-03-28
29     0.6390108   0.6390108         0.000000      0.0000000 2000-03-29
30     0.6342730   0.6342730         0.000000      0.0000000 2000-03-30
like image 95
IRTFM Avatar answered Jan 31 '23 05:01

IRTFM