Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R Filtering data using reference time table in R

I have a dataframe (example below) which has one time and 2 other variables

data<- data.frame(structure(list(datetime = c("7/17/2017 8:16:53", "7/17/2017 8:16:55", 
                            "7/17/2017 8:16:57", "7/17/2017 8:16:59", "7/17/2017 8:17:01", 
                            "7/17/2017 8:17:02", "7/17/2017 8:17:04", "7/17/2017 8:17:06", 
                            "7/17/2017 8:17:08", "7/17/2017 8:17:10", "7/17/2017 8:17:12", 
                            "7/17/2017 8:17:13", "7/17/2017 8:17:15", "7/17/2017 8:17:17", 
                            "7/17/2017 8:17:19", "7/17/2017 8:17:21", "7/17/2017 8:17:22", 
                            "7/17/2017 8:17:27", "7/17/2017 8:17:29", NA, NA), var1 = c(252.234873, 
                                                                                        254.0436836, 252.5279108, 252.4802478, 252.6377229, 253.8766496, 
                                                                                        249.8086397, 249.5646219, 249.1815691, 253.9509387, 251.7245156, 
                                                                                  251.8415925, 254.2059507, 253.9145112, 251.8415925, 254.2059507, 
                                                                                        253.9145112, 252.4802478, 252.6377229, NA, NA), var2 = c(582.5766695, 
                                                                                                                                                 583.0972735, 582.7872586, 582.312636, 579.6445667, 579.7995196, 
                                                                                                                                                 578.9574528, 576.5341483, 575.8460797, 574.2353493, 574.8998519, 
                                                                                                                                                 574.1717159, 573.8133058, 574.6849578, 574.1717159, 573.8133058, 
                                                                                                                                                 574.6849578, 582.312636, 579.6445667, NA, NA)), .Names = c("datetime", 
                                                                                                                                                                                                            "var1", "var2"), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, 
                                                                                                                                                                                                                                                                                     -21L), spec = structure(list(cols = structure(list(datetime = structure(list(), class = c("collector_character", 
                                                                                                                                                                                                                                                                                                                                                                               "collector")), var1 = structure(list(), class = c("collector_double", 
                                                                                                                                                                                                                                                                                                                                                                                                                                 "collector")), var2 = structure(list(), class = c("collector_double", 
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   "collector"))), .Names = c("datetime", "var1", "var2")), default = structure(list(), class = c("collector_guess", 
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  "collector"))), .Names = c("cols", "default"), class = "col_spec")))

I would like to filter my data based on the time variable into different periods. I have listed the periods between as From and To as example below

tab_filt <- data.frame(structure(list(From = c("7/17/2017 8:16:53", "7/17/2017 8:17:04", 
                                               "7/17/2017 8:17:19"), To = c("7/17/2017 8:16:59", "7/17/2017 8:17:10", 
                                                                            "7/17/2017 8:17:27")), .Names = c("From", "To"), class = c("tbl_df", 
                                                                                                                                       "tbl", "data.frame"), row.names = c(NA, -3L), spec = structure(list(
                                                                                                                                         cols = structure(list(From = structure(list(), class = c("collector_character", 
                                                                                                                                                                                                  "collector")), To = structure(list(), class = c("collector_character", 
                                                                                                                                                                                                                                                  "collector"))), .Names = c("From", "To")), default = structure(list(), class = c("collector_guess", 
                                                                                                                                                                                                                                                                                                                                   "collector"))), .Names = c("cols", "default"), class = "col_spec")))

To ease your help I have also converted time into Posixct for the example data

data$datetime <- as.POSIXct(strptime(data$datetime, format="%m/%d/%Y %H:%M:%S"))
tab_filt$From <- as.POSIXct(strptime(tab_filt$From, format="%m/%d/%Y %H:%M:%S"))
tab_filt$To <- as.POSIXct(strptime(tab_filt$To, format="%m/%d/%Y %H:%M:%S"))

I would to like to know how could I filter my data only for the periods in the second table. Please Help

Let me know if you need any additional details :)

like image 506
Kathiravan Meeran Avatar asked May 01 '26 13:05

Kathiravan Meeran


2 Answers

Here is a neat way using the packge lubridate:

library(lubridate)
library(dplyr)

# create intervals using %--%
ints <- tab_filt$From %--% tab_filt$To

# check for each row if datetime lies in any of the intervals using %within%
data %>% 
  rowwise() %>%
  mutate(In = any(datetime %within% ints))

This results in

# A tibble: 21 x 4
   datetime             var1  var2 In   
   <dttm>              <dbl> <dbl> <lgl>
 1 2017-07-17 08:16:53  252.  583. TRUE 
 2 2017-07-17 08:16:55  254.  583. TRUE 
 3 2017-07-17 08:16:57  253.  583. TRUE 
 4 2017-07-17 08:16:59  252.  582. TRUE 
 5 2017-07-17 08:17:01  253.  580. FALSE
 6 2017-07-17 08:17:02  254.  580. FALSE
 7 2017-07-17 08:17:04  250.  579. TRUE 
 8 2017-07-17 08:17:06  250.  577. TRUE 
 9 2017-07-17 08:17:08  249.  576. TRUE 
10 2017-07-17 08:17:10  254.  574. TRUE 
# ... with 11 more rows

where In = FALSE indicates that these rows should be dropped. To do so, just add %>% filter(In) to the pipe above.

like image 160
Martin Schmelzer Avatar answered May 04 '26 04:05

Martin Schmelzer


First of all I must thank OP for adding sample data and related commands to convert to date fields.

One can use data.table to join data with tab_filt to filter data which falls within range of From and To defined in tab_filt:

library(data.table)
setDT(data)
setDT(tab_filt)

data[tab_filt, .(x.datetime,x.var1,x.var2), on=.(datetime <= To, datetime >= From)]
#             x.datetime  x.var1   x.var2
# 1: 2017-07-17 08:16:53 252.2349 582.5767
# 2: 2017-07-17 08:16:55 254.0437 583.0973
# 3: 2017-07-17 08:16:57 252.5279 582.7873
# 4: 2017-07-17 08:16:59 252.4802 582.3126
# 5: 2017-07-17 08:17:04 249.8086 578.9575
# 6: 2017-07-17 08:17:06 249.5646 576.5341
# 7: 2017-07-17 08:17:08 249.1816 575.8461
# 8: 2017-07-17 08:17:10 253.9509 574.2353
# 9: 2017-07-17 08:17:19 251.8416 574.1717
# 10: 2017-07-17 08:17:21 254.2060 573.8133
# 11: 2017-07-17 08:17:22 253.9145 574.6850
# 12: 2017-07-17 08:17:27 252.4802 582.3126
like image 25
MKR Avatar answered May 04 '26 03:05

MKR