Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Merge data, set NA values, and replace NA values

Tags:

merge

replace

r

na

I have found out that my data set is not consistent. The data Frame is called DF1 For instance there is a NA value for time 9:49, but time 9:48 does not exist at all. It Looks like this:

Time                |  1  |  2
2016-05-11 09:45:00 | NA  | NA
2016-05-11 09:46:00 | 4.4 | 6.6
2016-05-11 09:47:00 | 5.8 | 7.0
2016-05-11 09:49:00 | NA  | NA
2016-05-11 09:50:00 | 5.6 | 7.3
2016-05-11 09:51:00 | 7.4 | 7.4

So I have created a new data Frame comprising all Dates and minutes consistently:

D2 = as.data.frame( seq( from = as.POSIXct("2016-05-11 09:45", tz = "GMT"), to = as.POSIXct("2016-05-11 09:50", tz = "GMT"), by = "min"))

How can I merge D2 and DF1 such that the values in column 1 and 2 at time 9:48 will turn into NA values. It should look like this:

Time                |  1  |  2
2016-05-11 09:45:00 | NA  | NA
2016-05-11 09:46:00 | 4.4 | 6.6
2016-05-11 09:47:00 | 5.8 | 7.0
2016-05-11 09:48:00 | NA  | NA
2016-05-11 09:49:00 | NA  | NA
2016-05-11 09:50:00 | 5.6 | 7.3
2016-05-11 09:51:00 | 7.4 | 7.4

In the next step I want to replace the NA values. If it is one NA value it should be replaced by the mean of the previous element and the next one. If there are two NA values the two values should be replaced by the mean of the 2 previous and two next values. If there are three, up until four NA values. If there are more then 4 consecutive NA values the values must not be changed. My new Data should look like this:

Time                |  1  |  2
2016-05-11 09:45:00 | NA  | NA  --> no previous value --> NA
2016-05-11 09:46:00 | 4.4 | 6.6
2016-05-11 09:47:00 | 5.8 | 7.0
2016-05-11 09:48:00 | 5.8 | 7.1 --> column 1 (4.4+5.8+5.6 + 7.4)/4
2016-05-11 09:49:00 | 5.8 | 7.1 --> column 2 (7.3+ 7.4 + 6.6 + 7.0)/4  
2016-05-11 09:50:00 | 5.6 | 7.3
2016-05-11 09:51:00 | 7.4 | 7.4
like image 398
Carolus Fridericus Avatar asked Nov 24 '25 23:11

Carolus Fridericus


1 Answers

Create data

# I edited you data slightly to include an example of a single row with missing values
df1 <- readr::read_csv(
"time,var1,var2
2016-05-11 09:45:00,NA,NA
2016-05-11 09:46:00,4.4,6.6
2016-05-11 09:47:00,NA,NA
2016-05-11 09:48:00,5.6,7.4
2016-05-11 09:49:00,5.8,7.0
2016-05-11 09:51:00,NA,NA
2016-05-11 09:52:00,5.6,7.3
2016-05-11 09:53:00,7.4,7.4")

Fill in missing rows

A nice way to solve these sorts of problems, making rows implicitly missing observations explicitly missing, is to use tidyr::complete.

library(tidyr)
library(lubridate)

df2 <- df1 %>% 
  complete(time = seq(ymd_hm("2016-05-11 09:45"), ymd_hm("2016-05-11 09:53"), by = "min"))

df2

#> # A tibble: 9 x 3
#>                  time  var1  var2
#>                <dttm> <dbl> <dbl>
#> 1 2016-05-11 09:45:00    NA    NA
#> 2 2016-05-11 09:46:00   4.4   6.6
#> 3 2016-05-11 09:47:00    NA    NA
#> 4 2016-05-11 09:48:00   5.6   7.4
#> 5 2016-05-11 09:49:00   5.8   7.0
#> 6 2016-05-11 09:50:00    NA    NA
#> 7 2016-05-11 09:51:00    NA    NA
#> 8 2016-05-11 09:52:00   5.6   7.3
#> 9 2016-05-11 09:53:00   7.4   7.4

Replace missing values

I'm not exactly sure what to do in the more complicated cases where there are sequential rows with missing values, but here is a way to solve the more simple cases where it's only a single row. lag and lead from dplyr might be helpful though.

library(dplyr)

df2 %>% 
  mutate(lag1 = lag(var1, n = 1L, order_by = time),
         lead1 = lead(var1, n = 1L, order_by = time)) %>% 
  rowwise() %>% 
  mutate(var1_edit = if_else(is.na(var1), mean(c(lag1, lead1)), var1)) %>% 
  select(time, var1, var1_edit) %>% # for simplicity focusing on one column
  ungroup()

#>                  time var1 var1_edit
#> 1 2016-05-11 09:45:00   NA        NA
#> 2 2016-05-11 09:46:00  4.4       4.4
#> 3 2016-05-11 09:47:00   NA       5.0
#> 4 2016-05-11 09:48:00  5.6       5.6
#> 5 2016-05-11 09:49:00  5.8       5.8
#> 6 2016-05-11 09:50:00   NA        NA
#> 7 2016-05-11 09:51:00   NA        NA
#> 8 2016-05-11 09:52:00  5.6       5.6
#> 9 2016-05-11 09:53:00  7.4       7.4
like image 198
austensen Avatar answered Nov 26 '25 13:11

austensen