Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

pandas: Remove all rows within time interval of another series's time index (i.e. time range exclusion)

Tags:

python

pandas

Suppose I have two dataframes:

#df1
time
2016-09-12 13:00:00.017    1.0
2016-09-12 13:00:03.233    1.0
2016-09-12 13:00:10.256    1.0
2016-09-12 13:00:19.605    1.0

#df2
time
2016-09-12 13:00:00.017    1.0
2016-09-12 13:00:00.233    0.0
2016-09-12 13:00:01.016    1.0
2016-09-12 13:00:01.505    0.0
2016-09-12 13:00:06.017    1.0
2016-09-12 13:00:07.233    0.0
2016-09-12 13:00:08.256    1.0
2016-09-12 13:00:19.705    0.0

I want to remove all rows in df2 that are up to +1 second of the time indices in df1, so yielding:

#result
time
2016-09-12 13:00:01.505    0.0
2016-09-12 13:00:06.017    1.0
2016-09-12 13:00:07.233    0.0
2016-09-12 13:00:08.256    1.0

What's the most efficient way to do this? I don't see anything useful for time range exclusions in the API.

like image 987
elleciel Avatar asked Nov 09 '16 17:11

elleciel


People also ask

How do I delete rows in pandas based on multiple conditions?

Pandas provide data analysts a way to delete and filter data frame using dataframe. drop() method. We can use this method to drop such rows that do not satisfy the given conditions.

How do you delete all rows that contain a value in pandas?

Use drop() method to delete rows based on column value in pandas DataFrame, as part of the data cleansing, you would be required to drop rows from the DataFrame when a column value matches with a static value or on another column value.

How do you get rid of unwanted rows in pandas?

To delete a row from a DataFrame, use the drop() method and set the index label as the parameter.

Does pandas Tolist preserve order?

Order will always be preserved. When you use the list function, you provide it an iterator, and construct a list by iterating over it.


1 Answers

You can use pd.merge_asof which is a new inclusion starting with 0.19.0 and also accepts a tolerance argument to match +/- that specified amount of time interval.

# Assuming time to be set as the index axis for both df's
df1.reset_index(inplace=True)
df2.reset_index(inplace=True)

df2.loc[pd.merge_asof(df2, df1, on='time', tolerance=pd.Timedelta('1s')).isnull().any(1)]

enter image description here

Note that default matching is carried out in the backwards direction, which means that selection occurs at the last row in the right DataFrame (df1) whose "on" key (which is "time") is less than or equal to the left's (df2) key. Hence, the tolerance parameter extends only in this direction (backward) resulting in a - range of matching.

To have both forward as well as backward lookups possible, starting with 0.20.0 this can be achieved by making use of direction='nearest' argument and including it in the function call. Due to this, the tolerance also gets extended both ways resulting in a +/- bandwidth range of matching.

like image 64
Nickil Maveli Avatar answered Sep 22 '22 10:09

Nickil Maveli