speed up pandas apply or using map

Tags:

python

pandas

I have a DataFrame and I want to fill a new column based on a lookup table. I can't used map since the values from the lookup table takes many indexes.

import pandas as pd
import numpy as np

d = pd.DataFrame({'I': np.random.randint(3, size=5),
                  'B0': np.random.choice([True, False], 5),
                  'B1': np.random.choice([True, False], 5)})

which is my data (actually my data are much bigger):

      B0     B1  I
0   True  False  0
1  False  False  0
2  False  False  1
3   True  False  1
4  False   True  2

then my lookup table:

l = pd.DataFrame({(True, True): [1.1, 2.2, 3.3],
              (True, False): [1.3, 2.1, 3.1],
              (False, True): [1.2, 2.1, 3.1],
              (False, False): [1.1, 2.0, 5.1]}
             )
l.index.name = 'I'
l.columns.names = 'B0', 'B1'
l = l.stack(['B0', 'B1'])

which is

I  B0     B1   
0  False  False    1.1
          True     1.2
   True   False    1.3
          True     1.1
1  False  False    2.0
          True     2.1
   True   False    2.1
          True     2.2
2  False  False    5.1
          True     3.1
   True   False    3.1
          True     3.3

so I want to add a column w from my data querying the loopup table on the values (I, B0, B1). I am using apply:

d['w'] = d.apply(lambda x: l[x['I'], x['B0'], x['B1']], axis=1)

and it works:

      B0     B1  I    w
0   True  False  0  1.3
1  False  False  0  1.1
2  False  False  1  2.0
3   True  False  1  2.1
4  False   True  2  3.1

the problem is that it is terribly slow. How to speed up this?

710

asked May 31 '17 16:05

Ruggero Turra

2 Answers

This should be quicker

find_these = list(zip(d.I, d.B0, d.B1))
d.assign(w=l.loc[find_these].values)

      B0     B1  I    w
0   True  False  0  1.3
1  False  False  0  1.1
2  False  False  1  2.0
3   True  False  1  2.1
4  False   True  2  3.1

With join

d.join(l.rename('w'), on=['I', 'B0', 'B1'])


      B0     B1  I    w
0   True  False  0  1.3
1  False  False  0  1.1
2  False  False  1  2.0
3   True  False  1  2.1
4  False   True  2  3.1

Timing
small data

%%timeit
find_these = list(zip(d.I, d.B0, d.B1))
d.assign(w=l.loc[find_these].values)
100 loops, best of 3: 1.98 ms per loop

%timeit d.assign(w=d.apply(lambda x: l[x['I'], x['B0'], x['B1']], axis=1))
100 loops, best of 3: 11.8 ms per loop

%timeit d.join(l.rename('w'), on=['I', 'B0', 'B1'])
100 loops, best of 3: 1.99 ms per loop

%timeit d.merge(l.reset_index())
100 loops, best of 3: 2.89 ms per loop

138

answered Oct 04 '22 03:10

piRSquared

we can merge d with a flat (after applying reset_index()) l:

In [5]: d.merge(l.reset_index())
Out[5]:
      B0     B1  I    0
0   True  False  0  1.3
1   True  False  0  1.3
2  False   True  0  1.2
3  False  False  0  1.1
4  False   True  2  3.1

answered Oct 04 '22 04:10

MaxU - stop WAR against UA

Related questions
                            
                                UpdateAPIView not working: Method "PATCH" not allowed
                            
                                Converting Nested Json into Python object
                            
                                asyncio CancelledError and KeyboardInterrupt
                            
                                How to create python classes in Jupyter Notebook
                            
                                How do I flatten a pandas dataframe keeping index and column names
                            
                                rename certain value in pandas series
                            
                                pdfkit - An A4 html page does not print into an A4 pdf
                            
                                How to install graphviz in Ubuntu 15 to plot a decision tree for XGBoost?
                            
                                Index JSON files in elasticsearch using Python?
                            
                                Python Gevent Pywsgi server with ssl
                            
                                How to wait for RxPy parallel threads to complete
                            
                                Apply migrations and models from all the apps
                            
                                Apply seaborn heatmap columnwise on pandas dataframe
                            
                                Calculate histograms along axis
                            
                                How to shuffle groups of rows of a Pandas dataframe?
                            
                                Installing a python package that is not available in anaconda (smtplib)
                            
                                How do I get a per mille sign in my axis title using Latex in matplotlib?
                            
                                Text to Binary in Python
                            
                                How to check if there's any odd/even numbers in an Iterable (e.g. list/tuple)?
                            
                                How to Install/add jdk 7 in Docker Container

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With