I am trying to take the the following dataset and convert it to the second. Basically, I'm trying to fill in the NA's between each ID with that ID.
Each ID corresponds to two time stamps, which I've joined in to a larger date_time column. For purposes of reproduce-ability, it's too computationally expensive to do a sql between join (the date_time column is very large) or even to take the original dataset and create the time stamps between each id and then join it in (I have too many ID's to do this). I've successfully completed both those methods and it just takes too much time for the amount of data I have. I'm hoping to manipulate the data with this data set. It seems like such a simple thing, but it's really got me stumped. Any help would be appreciated.
Current data set:
date_time id
<dttm> <dbl>
1 2017-01-30 08:00:00 NA
2 2017-01-30 08:00:01 NA
3 2017-01-30 08:00:02 1
4 2017-01-30 08:00:03 NA
5 2017-01-30 08:00:04 NA
6 2017-01-30 08:00:05 NA
7 2017-01-30 08:00:06 NA
8 2017-01-30 08:00:07 1
9 2017-01-30 08:00:08 NA
10 2017-01-30 08:00:09 NA
11 2017-01-30 08:00:10 2
12 2017-01-30 08:00:11 NA
13 2017-01-30 08:00:12 NA
14 2017-01-30 08:00:13 NA
15 2017-01-30 08:00:14 2
16 2017-01-30 08:00:15 NA
17 2017-01-30 08:00:16 3
18 2017-01-30 08:00:17 NA
19 2017-01-30 08:00:18 3
20 2017-01-30 08:00:19 NA
Desired dataset:
date_time id
<dttm> <dbl>
1 2017-01-30 08:00:00 NA
2 2017-01-30 08:00:01 NA
3 2017-01-30 08:00:02 1
4 2017-01-30 08:00:03 1
5 2017-01-30 08:00:04 1
6 2017-01-30 08:00:05 1
7 2017-01-30 08:00:06 1
8 2017-01-30 08:00:07 1
9 2017-01-30 08:00:08 NA
10 2017-01-30 08:00:09 NA
11 2017-01-30 08:00:10 2
12 2017-01-30 08:00:11 2
13 2017-01-30 08:00:12 2
14 2017-01-30 08:00:13 2
15 2017-01-30 08:00:14 2
16 2017-01-30 08:00:15 NA
17 2017-01-30 08:00:16 3
18 2017-01-30 08:00:17 3
19 2017-01-30 08:00:18 3
20 2017-01-30 08:00:19 NA
dput() date:
structure(list(date_time = structure(c(1485781200, 1485781201,
1485781202, 1485781203, 1485781204, 1485781205, 1485781206, 1485781207,
1485781208, 1485781209, 1485781210, 1485781211, 1485781212, 1485781213,
1485781214, 1485781215, 1485781216, 1485781217, 1485781218, 1485781219
), class = c("POSIXct", "POSIXt"), tzone = ""), trx_id = c(NA_real_,
NA_real_, 1, NA_real_, NA_real_, NA_real_, NA_real_, 1,
NA_real_, NA_real_, 2, NA_real_, NA_real_, NA_real_, 2,
NA_real_, 3, NA_real_, 3, NA_real_)), .Names = c("date_time",
"trx_id"), row.names = c(NA, -20L), class = c("tbl_df", "tbl",
"data.frame"))
One solution could be using fill
function from tidyr
. The approach is simple. First create 2 columns one each for prev
and next
values. Use fill
to populate missing values in both columns.
Now, for rows which has same value in both prev_val
and next_val
the value should be updated with prev_val
(that means those missing values are between same number)
df <- read.table(text = "sl date_time, value
1 '2017-01-30 08:00:00' NA
2 '2017-01-30 08:00:01' NA
3 '2017-01-30 08:00:02' 1
4 '2017-01-30 08:00:03' NA
5 '2017-01-30 08:00:04' NA
6 '2017-01-30 08:00:05' NA
7 '2017-01-30 08:00:06' NA
8 '2017-01-30 08:00:07' 1
9 '2017-01-30 08:00:08' NA
10 '2017-01-30 08:00:09' NA
11 '2017-01-30 08:00:10' 2
12 '2017-01-30 08:00:11' NA
13 '2017-01-30 08:00:12' NA
14 '2017-01-30 08:00:13' NA
15 '2017-01-30 08:00:14' 2
16 '2017-01-30 08:00:15' NA
17 '2017-01-30 08:00:16' 3
18 '2017-01-30 08:00:17' NA
19 '2017-01-30 08:00:18' 3
20 '2017-01-30 08:00:19' NA", header = T, stringsAsFactor = F)
#use fill to find missing values
df %>%
mutate(prev_val = (value), next_val = (value)) %>%
fill(prev_val, .direction = "down") %>%
fill(next_val, .direction = "up") %>%
mutate(value = ifelse(prev_val == next_val, prev_val, value )) %>%
select(-prev_val, -next_val)
Result:
sl date_time. value
1 1 2017-01-30 08:00:00 NA
2 2 2017-01-30 08:00:01 NA
3 3 2017-01-30 08:00:02 1
4 4 2017-01-30 08:00:03 1
5 5 2017-01-30 08:00:04 1
6 6 2017-01-30 08:00:05 1
7 7 2017-01-30 08:00:06 1
8 8 2017-01-30 08:00:07 1
9 9 2017-01-30 08:00:08 NA
10 10 2017-01-30 08:00:09 NA
11 11 2017-01-30 08:00:10 2
12 12 2017-01-30 08:00:11 2
13 13 2017-01-30 08:00:12 2
14 14 2017-01-30 08:00:13 2
15 15 2017-01-30 08:00:14 2
16 16 2017-01-30 08:00:15 NA
17 17 2017-01-30 08:00:16 3
18 18 2017-01-30 08:00:17 3
19 19 2017-01-30 08:00:18 3
20 20 2017-01-30 08:00:19 NA
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With