Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to remove rows from a data table based on a condition in another data table

Tags:

r

data.table

I have 2 data frames:

master = data.table(MasterTimes= as.POSIXct("2015-01-01", tz = "GMT") + seq(1,100,1))
mydata = data.frame(MyTimes = as.POSIXct(c("2015-01-01 00:00:18","2015-01-01 00:00:54","2015-01-01 00:00:48","2015-01-01 00:01:10","2015-01-01 00:01:05"),tz = "GMT"))

I would like to keep any rows in master within +/- 5 second window of any time in the mydata data frame. I would like to remove the rows in master that do not meet that condition.

Here is a simpler example if mydata only has 1 rows:

master = data.table(MasterTimes= as.POSIXct("2015-01-01", tz = "GMT") + seq(1,100,1))
mydata = data.frame(MyTimes = as.POSIXct(c("2015-01-01 00:00:18"),tz = "GMT"))

You can see mydata only contains "2015-01-01 00:00:18". In this case I want to remove all the rows from the master data frame where the time is not within the +- 5 second window i.e I want to remove all rows from master before "2015-01-01 00:00:13" and after "2015-01-01 00:00:23"

Thats the simple case but a harder case is if mydata contains

   mydata = data.frame(MyTimes = as.POSIXct(c("2015-01-01 00:00:18", "2015-01-01 00:00:22"),tz = "GMT"))

In this case because "2015-01-01 00:00:18" is there again I would normally remove all the rows in master before "2015-01-01 00:00:13" and after "2015-01-01 00:00:23".

But in this case I can't do that because mydata also contains "2015-01-01 00:00:22" so I want to keep all the rows in master after "2015-01-01 00:00:18" and before "2015-01-01 00:00:27"

Because "2015-01-01 00:00:22" is in my data I now need to keep the rows in master from "2015-01-01 00:00:23" to "2015-01-01 00:00:27"

Basically I want to keep any row in master that is within a +/- 5 second window of every row in mydata. If there are any rows in master that are not within a 5 second window I want to delete it.

Update

Can you advise how to implement this if master and mydata have more than 1 column like:

master = data.table(MasterTimes= as.POSIXct("2015-01-01", tz = "GMT") + seq(1,100,1), otherol = seq(1,100,1))
mydata = data.frame(MyTimes = as.POSIXct(c("2015-01-01 00:00:18"),tz = "GMT"),othercol = c(1))

In reality both master and mydata have 50+ columns.

like image 388
user3022875 Avatar asked Mar 23 '16 17:03

user3022875


People also ask

How do I delete rows with certain conditions?

Use pandas. DataFrame. drop() method to delete/remove rows with condition(s).

How do you delete a row that does not match the criteria on another sheet?

Right click to click Delete from the context menu to delete the rows which are not matching the criteria on another sheet.

How do I remove rows from a DataFrame based on conditions in R?

For example, we can use the subset() function if we want to drop a row based on a condition. If we prefer to work with the Tidyverse package, we can use the filter() function to remove (or select) rows based on values in a column (conditionally, that is, and the same as using subset).


1 Answers

Base R solution:

check_valid_time <- function(row, mydata){
   any(row > mydata$MyTimes - 5 & row < mydata$MyTimes + 5)
}

master[sapply(master$MasterTimes, check_valid_time, mydata),]
like image 188
C_Z_ Avatar answered Oct 19 '22 11:10

C_Z_