Questions Linux Laravel Mysql Ubuntu Git Menu

HTML CSS JAVASCRIPT SQL PYTHON PHP BOOTSTRAP JAVA JQUERY R React Kotlin

How can I merge two dataframes with 'wildcards'?

Tags:

python

pandas

I have a simple dataframe like so:

   p     b
0  a   buy
1  b   buy
2  a  sell
3  b  sell

and a lookup table like this:

   p     b    v
0  a   buy  123
1  a  sell  456
2  a     *  888
4  b     *  789

How can I merge (join) the two dataframes, but respecting the 'wildcard' in column b, i.e. the expected result is:

   p     b    v
0  a   buy  123
1  b   buy  789
2  a  sell  456
3  b  sell  789

The best I can come up with is this, but it's pretty ugly and verbose:

data = pd.DataFrame([
        ['a', 'buy'],
        ['b', 'buy'],         
        ['a', 'sell'],
        ['b', 'sell'],              
    ], columns = ['p', 'b'])
lookup = pd.DataFrame([
        ['a', 'buy', 123],
        ['a', 'sell', 456],
        ['a', '*', 888],
        ['b', '*', 789],        
], columns = ['p','b', 'v'])

x = data.reset_index()
y1 = pd.merge(x, lookup, on=['p', 'b'], how='left').set_index('index')
y2 = pd.merge(x[y1['v'].isnull()], lookup, on=['p'], how='left' ).set_index('index')
data['v'] = y1['v'].fillna(y2['v'])

Is there a smarter way?

like image

427

asked Jun 09 '16 17:06

Matthew

People also ask

Which method is used to merge two DataFrames?

The concat() function in pandas is used to append either columns or rows from one DataFrame to another.

2 Answers

I think a little cleaner is to clean up the wildcards first:

In [11]: wildcards = lookup[lookup["b"] == "*"]

In [12]: wildcards.pop("b")  # ditch the * column, it'll confuse the later merge

Now you can combine the two merges (without needing set_index) with an update:

In [13]: res = df.merge(lookup, how="left")

In [14]: res
Out[14]:
   p     b      v
0  a   buy  123.0
1  b   buy    NaN
2  a  sell  456.0
3  b  sell    NaN

In [15]: res.update(df.merge(wildcards, how="left"), overwrite=False)

In [16]: res
Out[16]:
   p     b      v
0  a   buy  123.0
1  b   buy  789.0
2  a  sell  456.0
3  b  sell  789.0

like image

120

answered Sep 26 '22 19:09

Andy Hayden

I find this intuitive:

def find_lookup(lookup, p, b):
    ps = lookup.p == p
    bs = lookup.b.isin([b, '*'])
    return lookup.loc[ps & bs].iloc[0].replace('*', b)

data.apply(lambda x: find_lookup(lookup, x.loc['p'], x.loc['b']), axis=1)

   p     b    v
0  a   buy  123
1  b   buy  789
2  a  sell  456
3  b  sell  789

like image

33

answered Sep 23 '22 19:09

piRSquared

Sign in to Comment

Related questions
                            
                                Installing imutils in ubuntu
                            
                                Plotting with SymPy
                            
                                Cumulative operations on dtype objects
                            
                                Django - Filter a date within a range with validation
                            
                                Convert a Haskell code to Python or pseudocode
                            
                                FFT in numpy vs FFT in MATLAB do not have the same results
                            
                                Array of ints in numba
                            
                                numpy: How can I select specific indexes in an np array for k-fold cross validation?
                            
                                How can I read in a binary file from hdfs into a Spark dataframe?
                            
                                different colors for rows in barh chart from pandas dataframe python
                            
                                Remove Action Bar Icon Kivy
                            
                                Numpy finding element index in another array
                            
                                Is it possible to loop through Amazon S3 bucket and count the number of lines in its file/key using Python?
                            
                                Tasks being repeated in Celery
                            
                                Subtracting Two Columns with a Groupby in Pandas
                            
                                Add text annotation to matplotlib plot from a pandas dataframe
                            
                                Python - Speed up for converting a categorical variable to it's numerical index
                            
                                Is there a function to return all single letter colors in Matplotlib?
                            
                                Numpy einsum broadcasting
                            
                                Upgrading from Django 1.6 to 1.9: python manage.py migrate failure

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With