Suppose I have two dataframes:
#df1
time
2016-09-12 13:00:00.017 1.0
2016-09-12 13:00:03.233 1.0
2016-09-12 13:00:10.256 1.0
2016-09-12 13:00:19.605 1.0
#df2
time
2016-09-12 13:00:00.017 1.0
2016-09-12 13:00:00.233 0.0
2016-09-12 13:00:01.016 1.0
2016-09-12 13:00:01.505 0.0
2016-09-12 13:00:06.017 1.0
2016-09-12 13:00:07.233 0.0
2016-09-12 13:00:08.256 1.0
2016-09-12 13:00:19.705 0.0
I want to remove all rows in df2
that are up to +1 second of the time indices in df1
, so yielding:
#result
time
2016-09-12 13:00:01.505 0.0
2016-09-12 13:00:06.017 1.0
2016-09-12 13:00:07.233 0.0
2016-09-12 13:00:08.256 1.0
What's the most efficient way to do this? I don't see anything useful for time range exclusions in the API.
Pandas provide data analysts a way to delete and filter data frame using dataframe. drop() method. We can use this method to drop such rows that do not satisfy the given conditions.
Use drop() method to delete rows based on column value in pandas DataFrame, as part of the data cleansing, you would be required to drop rows from the DataFrame when a column value matches with a static value or on another column value.
To delete a row from a DataFrame, use the drop() method and set the index label as the parameter.
Order will always be preserved. When you use the list function, you provide it an iterator, and construct a list by iterating over it.
You can use pd.merge_asof
which is a new inclusion starting with 0.19.0
and also accepts a tolerance argument to match +/- that specified amount of time interval.
# Assuming time to be set as the index axis for both df's
df1.reset_index(inplace=True)
df2.reset_index(inplace=True)
df2.loc[pd.merge_asof(df2, df1, on='time', tolerance=pd.Timedelta('1s')).isnull().any(1)]
Note that default matching is carried out in the backwards direction, which means that selection occurs at the last row in the right DataFrame (df1
) whose "on"
key (which is "time"
) is less than or equal to the left's (df2
) key. Hence, the tolerance
parameter extends only in this direction (backward) resulting in a -
range of matching.
To have both forward as well as backward lookups possible, starting with 0.20.0
this can be achieved by making use of direction='nearest'
argument and including it in the function call. Due to this, the tolerance
also gets extended both ways resulting in a +/-
bandwidth range of matching.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With