Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

speed up pandas apply or using map

Tags:

python

pandas

I have a DataFrame and I want to fill a new column based on a lookup table. I can't used map since the values from the lookup table takes many indexes.

import pandas as pd
import numpy as np

d = pd.DataFrame({'I': np.random.randint(3, size=5),
                  'B0': np.random.choice([True, False], 5),
                  'B1': np.random.choice([True, False], 5)})

which is my data (actually my data are much bigger):

      B0     B1  I
0   True  False  0
1  False  False  0
2  False  False  1
3   True  False  1
4  False   True  2

then my lookup table:

l = pd.DataFrame({(True, True): [1.1, 2.2, 3.3],
              (True, False): [1.3, 2.1, 3.1],
              (False, True): [1.2, 2.1, 3.1],
              (False, False): [1.1, 2.0, 5.1]}
             )
l.index.name = 'I'
l.columns.names = 'B0', 'B1'
l = l.stack(['B0', 'B1'])

which is

I  B0     B1   
0  False  False    1.1
          True     1.2
   True   False    1.3
          True     1.1
1  False  False    2.0
          True     2.1
   True   False    2.1
          True     2.2
2  False  False    5.1
          True     3.1
   True   False    3.1
          True     3.3

so I want to add a column w from my data querying the loopup table on the values (I, B0, B1). I am using apply:

d['w'] = d.apply(lambda x: l[x['I'], x['B0'], x['B1']], axis=1)

and it works:

      B0     B1  I    w
0   True  False  0  1.3
1  False  False  0  1.1
2  False  False  1  2.0
3   True  False  1  2.1
4  False   True  2  3.1

the problem is that it is terribly slow. How to speed up this?

like image 710
Ruggero Turra Avatar asked May 31 '17 16:05

Ruggero Turra


People also ask

What is faster map or apply Pandas?

As mentioned previously, this is because apply is optimized for looping through dataframe rows much quicker than iterrows does. While slower than apply , itertuples is quicker than iterrows , so if looping is required, try implementing itertuples instead. Using map as a vectorized solution gives even faster results.

How do you make apply faster in Pandas?

You can speed up the execution even faster by using another trick: making your pandas' dataframes lighter by using more efficent data types. As we know that df only contains integers from 1 to 10, we can then reduce the data type from 64 bits to 16 bits. See how we reduced the size of our dataframe from 38MB to 9.5MB.

Is Pandas apply faster than Iterrows?

By using apply and specifying one as the axis, we can run a function on every row of a dataframe. This solution also uses looping to get the job done, but apply has been optimized better than iterrows , which results in faster runtimes.

Is Applymap faster than apply?

applymap() is only available in DataFrame and used for element-wise operation across the whole DataFrame. It has been optimized and some cases work much faster than apply() , but it's good to compare it with apply() before going for any heavier operation.


2 Answers

This should be quicker

find_these = list(zip(d.I, d.B0, d.B1))
d.assign(w=l.loc[find_these].values)

      B0     B1  I    w
0   True  False  0  1.3
1  False  False  0  1.1
2  False  False  1  2.0
3   True  False  1  2.1
4  False   True  2  3.1

With join

d.join(l.rename('w'), on=['I', 'B0', 'B1'])


      B0     B1  I    w
0   True  False  0  1.3
1  False  False  0  1.1
2  False  False  1  2.0
3   True  False  1  2.1
4  False   True  2  3.1

Timing
small data

%%timeit
find_these = list(zip(d.I, d.B0, d.B1))
d.assign(w=l.loc[find_these].values)
100 loops, best of 3: 1.98 ms per loop

%timeit d.assign(w=d.apply(lambda x: l[x['I'], x['B0'], x['B1']], axis=1))
100 loops, best of 3: 11.8 ms per loop

%timeit d.join(l.rename('w'), on=['I', 'B0', 'B1'])
100 loops, best of 3: 1.99 ms per loop

%timeit d.merge(l.reset_index())
100 loops, best of 3: 2.89 ms per loop
like image 138
piRSquared Avatar answered Oct 04 '22 03:10

piRSquared


we can merge d with a flat (after applying reset_index()) l:

In [5]: d.merge(l.reset_index())
Out[5]:
      B0     B1  I    0
0   True  False  0  1.3
1   True  False  0  1.3
2  False   True  0  1.2
3  False  False  0  1.1
4  False   True  2  3.1
like image 23
MaxU - stop WAR against UA Avatar answered Oct 04 '22 04:10

MaxU - stop WAR against UA