I have 2 data frames:
master = data.table(MasterTimes= as.POSIXct("2015-01-01", tz = "GMT") + seq(1,100,1))
mydata = data.frame(MyTimes = as.POSIXct(c("2015-01-01 00:00:18","2015-01-01 00:00:54","2015-01-01 00:00:48","2015-01-01 00:01:10","2015-01-01 00:01:05"),tz = "GMT"))
I would like to keep any rows in master within +/- 5 second window of any time in the mydata data frame. I would like to remove the rows in master that do not meet that condition.
Here is a simpler example if mydata only has 1 rows:
master = data.table(MasterTimes= as.POSIXct("2015-01-01", tz = "GMT") + seq(1,100,1))
mydata = data.frame(MyTimes = as.POSIXct(c("2015-01-01 00:00:18"),tz = "GMT"))
You can see mydata only contains "2015-01-01 00:00:18". In this case I want to remove all the rows from the master data frame where the time is not within the +- 5 second window i.e I want to remove all rows from master before "2015-01-01 00:00:13" and after "2015-01-01 00:00:23"
Thats the simple case but a harder case is if mydata contains
mydata = data.frame(MyTimes = as.POSIXct(c("2015-01-01 00:00:18", "2015-01-01 00:00:22"),tz = "GMT"))
In this case because "2015-01-01 00:00:18" is there again I would normally remove all the rows in master before "2015-01-01 00:00:13" and after "2015-01-01 00:00:23".
But in this case I can't do that because mydata also contains "2015-01-01 00:00:22" so I want to keep all the rows in master after "2015-01-01 00:00:18" and before "2015-01-01 00:00:27"
Because "2015-01-01 00:00:22" is in my data I now need to keep the rows in master from "2015-01-01 00:00:23" to "2015-01-01 00:00:27"
Basically I want to keep any row in master that is within a +/- 5 second window of every row in mydata. If there are any rows in master that are not within a 5 second window I want to delete it.
Can you advise how to implement this if master and mydata have more than 1 column like:
master = data.table(MasterTimes= as.POSIXct("2015-01-01", tz = "GMT") + seq(1,100,1), otherol = seq(1,100,1))
mydata = data.frame(MyTimes = as.POSIXct(c("2015-01-01 00:00:18"),tz = "GMT"),othercol = c(1))
In reality both master and mydata have 50+ columns.
Use pandas. DataFrame. drop() method to delete/remove rows with condition(s).
Right click to click Delete from the context menu to delete the rows which are not matching the criteria on another sheet.
For example, we can use the subset() function if we want to drop a row based on a condition. If we prefer to work with the Tidyverse package, we can use the filter() function to remove (or select) rows based on values in a column (conditionally, that is, and the same as using subset).
Base R solution:
check_valid_time <- function(row, mydata){
any(row > mydata$MyTimes - 5 & row < mydata$MyTimes + 5)
}
master[sapply(master$MasterTimes, check_valid_time, mydata),]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With