Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I merge two dataframes with 'wildcards'?

Tags:

python

pandas

I have a simple dataframe like so:

   p     b
0  a   buy
1  b   buy
2  a  sell
3  b  sell

and a lookup table like this:

   p     b    v
0  a   buy  123
1  a  sell  456
2  a     *  888
4  b     *  789

How can I merge (join) the two dataframes, but respecting the 'wildcard' in column b, i.e. the expected result is:

   p     b    v
0  a   buy  123
1  b   buy  789
2  a  sell  456
3  b  sell  789

The best I can come up with is this, but it's pretty ugly and verbose:

data = pd.DataFrame([
        ['a', 'buy'],
        ['b', 'buy'],         
        ['a', 'sell'],
        ['b', 'sell'],              
    ], columns = ['p', 'b'])
lookup = pd.DataFrame([
        ['a', 'buy', 123],
        ['a', 'sell', 456],
        ['a', '*', 888],
        ['b', '*', 789],        
], columns = ['p','b', 'v'])

x = data.reset_index()
y1 = pd.merge(x, lookup, on=['p', 'b'], how='left').set_index('index')
y2 = pd.merge(x[y1['v'].isnull()], lookup, on=['p'], how='left' ).set_index('index')
data['v'] = y1['v'].fillna(y2['v'])

Is there a smarter way?

like image 427
Matthew Avatar asked Jun 09 '16 17:06

Matthew


People also ask

Which method is used to merge two DataFrames?

The concat() function in pandas is used to append either columns or rows from one DataFrame to another.


2 Answers

I think a little cleaner is to clean up the wildcards first:

In [11]: wildcards = lookup[lookup["b"] == "*"]

In [12]: wildcards.pop("b")  # ditch the * column, it'll confuse the later merge

Now you can combine the two merges (without needing set_index) with an update:

In [13]: res = df.merge(lookup, how="left")

In [14]: res
Out[14]:
   p     b      v
0  a   buy  123.0
1  b   buy    NaN
2  a  sell  456.0
3  b  sell    NaN

In [15]: res.update(df.merge(wildcards, how="left"), overwrite=False)

In [16]: res
Out[16]:
   p     b      v
0  a   buy  123.0
1  b   buy  789.0
2  a  sell  456.0
3  b  sell  789.0
like image 120
Andy Hayden Avatar answered Sep 26 '22 19:09

Andy Hayden


I find this intuitive:

def find_lookup(lookup, p, b):
    ps = lookup.p == p
    bs = lookup.b.isin([b, '*'])
    return lookup.loc[ps & bs].iloc[0].replace('*', b)

data.apply(lambda x: find_lookup(lookup, x.loc['p'], x.loc['b']), axis=1)

   p     b    v
0  a   buy  123
1  b   buy  789
2  a  sell  456
3  b  sell  789
like image 33
piRSquared Avatar answered Sep 23 '22 19:09

piRSquared