How to join two dataframes for which column values are within a certain range?

Tags:

Given two dataframes df_1 and df_2, how to join them such that datetime column df_1 is in between start and end in dataframe df_2:

print df_1    timestamp              A          B 0 2016-05-14 10:54:33    0.020228   0.026572 1 2016-05-14 10:54:34    0.057780   0.175499 2 2016-05-14 10:54:35    0.098808   0.620986 3 2016-05-14 10:54:36    0.158789   1.014819 4 2016-05-14 10:54:39    0.038129   2.384590   print df_2    start                end                  event     0 2016-05-14 10:54:31  2016-05-14 10:54:33  E1 1 2016-05-14 10:54:34  2016-05-14 10:54:37  E2 2 2016-05-14 10:54:38  2016-05-14 10:54:42  E3

Get corresponding event where df1.timestamp is between df_2.start and df2.end

  timestamp              A          B          event 0 2016-05-14 10:54:33    0.020228   0.026572   E1 1 2016-05-14 10:54:34    0.057780   0.175499   E2 2 2016-05-14 10:54:35    0.098808   0.620986   E2 3 2016-05-14 10:54:36    0.158789   1.014819   E2 4 2016-05-14 10:54:39    0.038129   2.384590   E3

843

asked Oct 02 '17 12:10

DougKruger

Video Answer

2 Answers

One simple solution is create interval index from start and end setting closed = both then use get_loc to get the event i.e (Hope all the date times are in timestamps dtype )

df_2.index = pd.IntervalIndex.from_arrays(df_2['start'],df_2['end'],closed='both') df_1['event'] = df_1['timestamp'].apply(lambda x : df_2.iloc[df_2.index.get_loc(x)]['event'])

Output :

             timestamp         A         B event 0 2016-05-14 10:54:33  0.020228  0.026572    E1 1 2016-05-14 10:54:34  0.057780  0.175499    E2 2 2016-05-14 10:54:35  0.098808  0.620986    E2 3 2016-05-14 10:54:36  0.158789  1.014819    E2 4 2016-05-14 10:54:39  0.038129  2.384590    E3

145

answered Oct 12 '22 03:10

Bharath

First use IntervalIndex to create a reference index based on the interval of interest, then use get_indexer to slice the dataframe which contains the discrete events of interest.

idx = pd.IntervalIndex.from_arrays(df_2['start'], df_2['end'], closed='both') event = df_2.iloc[idx.get_indexer(df_1.timestamp), 'event']  event 0    E1 1    E2 1    E2 1    E2 2    E3 Name: event, dtype: object  df_1['event'] = event.to_numpy() df_1             timestamp         A         B event 0 2016-05-14 10:54:33  0.020228  0.026572    E1 1 2016-05-14 10:54:34  0.057780  0.175499    E2 2 2016-05-14 10:54:35  0.098808  0.620986    E2 3 2016-05-14 10:54:36  0.158789  1.014819    E2 4 2016-05-14 10:54:39  0.038129  2.384590    E3

Reference: A question on IntervalIndex.get_indexer.

answered Oct 12 '22 04:10

cs95

Related questions
                            
                                How to ssh connect through python Paramiko with ppk public key
                            
                                How to run python script in webpage
                            
                                How can I wrap a synchronous function in an async coroutine?
                            
                                Resources for lexing, tokenising and parsing in python
                            
                                Aggregation in Pandas
                            
                                Use slice notation with collections.deque
                            
                                Python Multiprocessing: Handling Child Errors in Parent
                            
                                tqdm show progress for a generator I know the length of
                            
                                TypeError: 'list' object is not callable while trying to access a list
                            
                                ImportError: No module named 'pandas.core.internals.managers'; 'pandas.core.internals' is not a package
                            
                                Pass keyword arguments to target function in Python threading.Thread
                            
                                How to annotate function that takes a tuple of variable length? (variadic tuple type annotation)
                            
                                Difference between type(obj) and obj.__class__
                            
                                Calculate time difference between Pandas Dataframe indices
                            
                                UnboundLocalError with nested function scopes
                            
                                Why is PyMongo 3 giving ServerSelectionTimeoutError?
                            
                                accessing "Shared with me" with Colab
                            
                                Check if any alert exists using selenium with python
                            
                                Django orm get latest for each group
                            
                                How to export virtualenv?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to join two dataframes for which column values are within a certain range?

Tags:

python

datetime

pandas

dataframe

intervals