pandas: Remove all rows within time interval of another series's time index (i.e. time range exclusion)

Tags:

pandas

Suppose I have two dataframes:

#df1
time
2016-09-12 13:00:00.017    1.0
2016-09-12 13:00:03.233    1.0
2016-09-12 13:00:10.256    1.0
2016-09-12 13:00:19.605    1.0

#df2
time
2016-09-12 13:00:00.017    1.0
2016-09-12 13:00:00.233    0.0
2016-09-12 13:00:01.016    1.0
2016-09-12 13:00:01.505    0.0
2016-09-12 13:00:06.017    1.0
2016-09-12 13:00:07.233    0.0
2016-09-12 13:00:08.256    1.0
2016-09-12 13:00:19.705    0.0

I want to remove all rows in df2 that are up to +1 second of the time indices in df1, so yielding:

#result
time
2016-09-12 13:00:01.505    0.0
2016-09-12 13:00:06.017    1.0
2016-09-12 13:00:07.233    0.0
2016-09-12 13:00:08.256    1.0

What's the most efficient way to do this? I don't see anything useful for time range exclusions in the API.

987

asked Nov 09 '16 17:11

1 Answers

You can use pd.merge_asof which is a new inclusion starting with 0.19.0 and also accepts a tolerance argument to match +/- that specified amount of time interval.

# Assuming time to be set as the index axis for both df's
df1.reset_index(inplace=True)
df2.reset_index(inplace=True)

df2.loc[pd.merge_asof(df2, df1, on='time', tolerance=pd.Timedelta('1s')).isnull().any(1)]

enter image description here

Note that default matching is carried out in the backwards direction, which means that selection occurs at the last row in the right DataFrame (df1) whose "on" key (which is "time") is less than or equal to the left's (df2) key. Hence, the tolerance parameter extends only in this direction (backward) resulting in a - range of matching.

To have both forward as well as backward lookups possible, starting with 0.20.0 this can be achieved by making use of direction='nearest' argument and including it in the function call. Due to this, the tolerance also gets extended both ways resulting in a +/- bandwidth range of matching.

answered Sep 22 '22 10:09

Nickil Maveli

Related questions
                            
                                Difference between model fields(in django) and serializer fields(in django rest framework)
                            
                                Python - Ignore letter case
                            
                                Python Numpy Poisson Distribution
                            
                                Regex - match a character and all its diacritic variations (aka accent-insensitive)
                            
                                UnicodeDecodeError on byte type
                            
                                Dividing complex rows of dataframe to simple rows in Pyspark
                            
                                Execute Python Script Every Hour on MacOS
                            
                                Python regex get group position
                            
                                Getting a 'modelformset_factory without defining 'fields' error' using Django inline formset. What am I doing wrong?
                            
                                How to combine dimensions in numpy array?
                            
                                Django get_object_or_404 or filter exists
                            
                                F# Equivalent of Python range
                            
                                How to remove 4th channel from PNG images
                            
                                how is spacy-io using multi threading without GIL?
                            
                                django rest framework https for absolute urls?
                            
                                Add double quotes to string in python
                            
                                Pandas Pivot Time-series by year
                            
                                how to split 'number' to separate columns in pandas DataFrame
                            
                                Selenium opens browser but doesn't load page
                            
                                Django - How to allow only the owner of a new post to edit or delete the post?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

pandas: Remove all rows within time interval of another series's time index (i.e. time range exclusion)

Tags:

python

pandas

elleciel

People also ask

1 Answers

Nickil Maveli

Recent Activity

Donate For Us