I have two tables A) Contains "Entry number" and "Amount" B) Contains "From Entry number" and "To Entry Number" and "Created Date"
I would then like to join the two, but I don't have a specific join key as table A has specific "Entry number" - for instance "1000", but table B has intervals, for instance "From Entry number" "900" and "To Entry number" "1100". In SQL I would be able to join them as:
select * from table A
left join table B
on a.[Entry number] >= b.[From Entry number] and a.[Entry number] <= b.[To Entry number]
But how can I achieve that is Pandas?
Disclamer: This solution could be quite memory-heavy.
I'm using the following two dataframes:
import pandas as pd
dfa = pd.DataFrame({"Entry Number": [1000, 50],
"Amount":[4,2]})
dfb = pd.DataFrame({"From Entry Number":[900, 60],
"To Entry Number":[1100, 100],
"Create Date":["2021-01-01", "2020-01-01"]})
dfa:
Entry Number | Amount |
---|---|
1000 | 4 |
50 | 2 |
dfb:
From Entry Number | To Entry Number | Create Date |
---|---|---|
900 | 1100 | 2021-01-01 |
60 | 100 | 2020-01-01 |
First create a Cartesian (cross) Joined dataframe:
df = dfa.merge(dfb, how = 'cross')
This creates the following joined dataframe:
Entry Number | Amount | From Entry Number | To Entry Number | Create Date |
---|---|---|---|---|
1000 | 4 | 900 | 1100 | 2021-01-01 |
1000 | 4 | 60 | 100 | 2020-01-01 |
50 | 2 | 900 | 1100 | 2021-01-01 |
50 | 2 | 60 | 100 | 2020-01-01 |
Now you can filter the dataframe based on your conditions:
df[
(df["Entry Number"] >= df["From Entry Number"]) &
(df["Entry Number"] <= df["To Entry Number"])
]
End result:
Entry Number | Amount | From Entry Number | To Entry Number | Create Date |
---|---|---|---|---|
1000 | 4 | 900 | 1100 | 2021-01-01 |
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With