I have 2 data frames:
master = data.table(MasterTimes= as.POSIXct("2015-01-01", tz = "GMT") + seq(1,100,1))
mydata = data.frame(MyTimes = as.POSIXct(c("2015-01-01 00:00:18","2015-01-01 00:00:54","2015-01-01 00:00:48","2015-01-01 00:01:10","2015-01-01 00:01:05"),tz = "GMT"))
I would like to keep any rows in master within +/- 5 second window of any time in the mydata
data frame. I would like to remove the rows in master
that do not meet that condition.
Here is a simpler example if mydata
only has 1 rows:
master = data.table(MasterTimes= as.POSIXct("2015-01-01", tz = "GMT") + seq(1,100,1))
mydata = data.frame(MyTimes = as.POSIXct(c("2015-01-01 00:00:18"),tz = "GMT"))
You can see mydata
only contains "2015-01-01 00:00:18"
. In this case I want to remove all the rows from the master data frame where the time is not within the +- 5 second window i.e I want to remove all rows from master
before "2015-01-01 00:00:13"
and after "2015-01-01 00:00:23"
Thats the simple case but a harder case is if mydata contains
mydata = data.frame(MyTimes = as.POSIXct(c("2015-01-01 00:00:18", "2015-01-01 00:00:22"),tz = "GMT"))
In this case because "2015-01-01 00:00:18"
is there again I would normally remove all the rows in master before "2015-01-01 00:00:13"
and after "2015-01-01 00:00:23"
.
But in this case I can't do that because mydata
also contains "2015-01-01 00:00:22"
so I want to keep all the rows in master
after "2015-01-01 00:00:18"
and before "2015-01-01 00:00:27"
Because "2015-01-01 00:00:22"
is in my data I now need to keep the rows in master from "2015-01-01 00:00:23"
to "2015-01-01 00:00:27"
Basically I want to keep any row in master that is within a +/- 5 second window of every row in mydata
. If there are any rows in master that are not within a 5 second window I want to delete it.
Can you advise how to implement this if master
and mydata
have more than 1 column like:
master = data.table(MasterTimes= as.POSIXct("2015-01-01", tz = "GMT") + seq(1,100,1), otherol = seq(1,100,1))
mydata = data.frame(MyTimes = as.POSIXct(c("2015-01-01 00:00:18"),tz = "GMT"),othercol = c(1))
In reality both master and mydata have 50+ columns.
Use pandas. DataFrame. drop() method to delete/remove rows with condition(s).
Right click to click Delete from the context menu to delete the rows which are not matching the criteria on another sheet.
For example, we can use the subset() function if we want to drop a row based on a condition. If we prefer to work with the Tidyverse package, we can use the filter() function to remove (or select) rows based on values in a column (conditionally, that is, and the same as using subset).
Base R solution:
check_valid_time <- function(row, mydata){
any(row > mydata$MyTimes - 5 & row < mydata$MyTimes + 5)
}
master[sapply(master$MasterTimes, check_valid_time, mydata),]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With