Merge pandas dataframes where one value is between two others [duplicate]

Tags:

I need to merge two pandas dataframes on an identifier and a condition where a date in one dataframe is between two dates in the other dataframe.

Dataframe A has a date ("fdate") and an ID ("cusip"):

enter image description here

I need to merge this with this dataframe B:

enter image description here

on A.cusip==B.ncusip and A.fdate is between B.namedt and B.nameenddt.

In SQL this would be trivial, but the only way I can see how to do this in pandas is to first merge unconditionally on the identifier, and then filter on the date condition:

df = pd.merge(A, B, how='inner', left_on='cusip', right_on='ncusip') df = df[(df['fdate']>=df['namedt']) & (df['fdate']<=df['nameenddt'])]

Is this really the best way to do this? It seems that it would be much better if one could filter within the merge so as to avoid having a potentially very large dataframe after the merge but before the filter has completed.

612

asked Jun 03 '15 18:06

itzy

1 Answers

As you say, this is pretty easy in SQL, so why not do it in SQL?

import pandas as pd import sqlite3  #We'll use firelynx's tables: presidents = pd.DataFrame({"name": ["Bush", "Obama", "Trump"],                            "president_id":[43, 44, 45]}) terms = pd.DataFrame({'start_date': pd.date_range('2001-01-20', periods=5, freq='48M'),                       'end_date': pd.date_range('2005-01-21', periods=5, freq='48M'),                       'president_id': [43, 43, 44, 44, 45]}) war_declarations = pd.DataFrame({"date": [datetime(2001, 9, 14), datetime(2003, 3, 3)],                                  "name": ["War in Afghanistan", "Iraq War"]}) #Make the db in memory conn = sqlite3.connect(':memory:') #write the tables terms.to_sql('terms', conn, index=False) presidents.to_sql('presidents', conn, index=False) war_declarations.to_sql('wars', conn, index=False)  qry = '''     select           start_date PresTermStart,         end_date PresTermEnd,         wars.date WarStart,         presidents.name Pres     from         terms join wars on         date between start_date and end_date join presidents on         terms.president_id = presidents.president_id     ''' df = pd.read_sql_query(qry, conn)

df:

         PresTermStart          PresTermEnd             WarStart  Pres 0  2001-01-31 00:00:00  2005-01-31 00:00:00  2001-09-14 00:00:00  Bush 1  2001-01-31 00:00:00  2005-01-31 00:00:00  2003-03-03 00:00:00  Bush

answered Sep 22 '22 11:09

cfort

Related questions
                            
                                how to store a complex object in redis (using redis-py)
                            
                                How to import a csv file using python with headers intact, where first column is a non-numerical
                            
                                Finding elements not in a list
                            
                                Common pitfalls in Python [duplicate]
                            
                                Combining two sorted lists in Python
                            
                                Random is barely random at all?
                            
                                Understanding metaclass and inheritance in Python [duplicate]
                            
                                Python MySQLdb: connection.close() VS. cursor.close()
                            
                                for x in y(): how does this work? [duplicate]
                            
                                Celery difference between concurrency, workers and autoscaling
                            
                                Passing all arguments of a function to another function
                            
                                How to properly create and run concurrent tasks using python's asyncio module?
                            
                                Conda uninstall one package and one package only
                            
                                Concatenate Numpy arrays without copying
                            
                                Python Virtualenv - No module named virtualenvwrapper.hook_loader
                            
                                How do I write to a Python subprocess' stdin?
                            
                                Python: check if an object is a sequence
                            
                                Custom PyCharm docstring stubs (i.e. for google docstring or numpydoc formats)
                            
                                Python: why are * and ** faster than / and sqrt()?
                            
                                Gunicorn, no module named 'myproject

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Merge pandas dataframes where one value is between two others [duplicate]

Tags:

python

join

pandas

date-range

timespan

itzy

People also ask

1 Answers

cfort

Recent Activity

Donate For Us