I have a simple dataframe like so:
p b
0 a buy
1 b buy
2 a sell
3 b sell
and a lookup table like this:
p b v
0 a buy 123
1 a sell 456
2 a * 888
4 b * 789
How can I merge (join) the two dataframes, but respecting the 'wildcard' in column b, i.e. the expected result is:
p b v
0 a buy 123
1 b buy 789
2 a sell 456
3 b sell 789
The best I can come up with is this, but it's pretty ugly and verbose:
data = pd.DataFrame([
['a', 'buy'],
['b', 'buy'],
['a', 'sell'],
['b', 'sell'],
], columns = ['p', 'b'])
lookup = pd.DataFrame([
['a', 'buy', 123],
['a', 'sell', 456],
['a', '*', 888],
['b', '*', 789],
], columns = ['p','b', 'v'])
x = data.reset_index()
y1 = pd.merge(x, lookup, on=['p', 'b'], how='left').set_index('index')
y2 = pd.merge(x[y1['v'].isnull()], lookup, on=['p'], how='left' ).set_index('index')
data['v'] = y1['v'].fillna(y2['v'])
Is there a smarter way?
The concat() function in pandas is used to append either columns or rows from one DataFrame to another.
I think a little cleaner is to clean up the wildcards
first:
In [11]: wildcards = lookup[lookup["b"] == "*"]
In [12]: wildcards.pop("b") # ditch the * column, it'll confuse the later merge
Now you can combine the two merges (without needing set_index
) with an update
:
In [13]: res = df.merge(lookup, how="left")
In [14]: res
Out[14]:
p b v
0 a buy 123.0
1 b buy NaN
2 a sell 456.0
3 b sell NaN
In [15]: res.update(df.merge(wildcards, how="left"), overwrite=False)
In [16]: res
Out[16]:
p b v
0 a buy 123.0
1 b buy 789.0
2 a sell 456.0
3 b sell 789.0
I find this intuitive:
def find_lookup(lookup, p, b):
ps = lookup.p == p
bs = lookup.b.isin([b, '*'])
return lookup.loc[ps & bs].iloc[0].replace('*', b)
data.apply(lambda x: find_lookup(lookup, x.loc['p'], x.loc['b']), axis=1)
p b v
0 a buy 123
1 b buy 789
2 a sell 456
3 b sell 789
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With