Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Fast subtraction of two dataframes ignoring indices (Python)

How do I subtract 2 dataframes ignoring indices, in the fastest way possible.

E.g., I want to subtract:

d1=
      x1
0 -3.141593
0 -3.141593
0 -3.141593
1 -2.443461
1 -2.443461

from

d2 = 
      x2
1 -2.443461
2 -1.745329
3 -1.047198
4 -0.349066
2 0.349066

What I have tried:

I can do it like this, e.g.:

dsub = d1.reset_index(drop=True) - d2.reset_index(drop=True)

However, I want to do the subtraction in the most efficient way possible. I have been looking around for an answer but I have only seen solutions that do not account for speed.

How do I accomplish this?


EDIT Based on some answers, here are some times by running on my machine:

For smaller dataframes:

Method 1 (a and b):

a: d1.reset_index(drop=True) - d2.reset_index(drop=True)
b: d1.reset_index(drop=True).sub(d2.reset_index(drop=True))
~1024.91 usec/pass

Method 2:

d1 - d2.values
~784.79 usec/pass

Method 3:

pd.DataFrame(d1.values - d2.values, d1.index, ['x1-x2'])
~653.82 usec/pass

For very large dataframes please see @MaxU's answer below.

like image 741
jesperk.eth Avatar asked Nov 07 '16 22:11

jesperk.eth


2 Answers

you can do it this way:

d1 - d2.values

or:

d1.x1 - d2.x2.values

Demo:

In [172]: d1 - d2.values
Out[172]:
         x1
0 -0.698132
0 -1.396264
0 -2.094395
1 -2.094395
1 -2.792527

In [173]: d1.x1 - d2.x2.values
Out[173]:
0   -0.698132
0   -1.396264
0   -2.094395
1   -2.094395
1   -2.792527
Name: x1, dtype: float64

Timing for bigger DFs:

In [180]: d1 = pd.concat([d1] * 10**5, ignore_index=True)

In [181]: d2 = pd.concat([d2] * 10**5, ignore_index=True)

In [182]: d1.shape
Out[182]: (500000, 1)

In [183]: %timeit pd.DataFrame(d1.values - d2.values, d1.index, ['x1-x2'])
100 loops, best of 3: 4.07 ms per loop

In [184]: %timeit d1 - d2.values
100 loops, best of 3: 3.99 ms per loop

In [185]: d1 = pd.concat([d1] * 10, ignore_index=True)

In [186]: d2 = pd.concat([d2] * 10, ignore_index=True)

In [187]: d1.shape
Out[187]: (5000000, 1)

In [188]: %timeit pd.DataFrame(d1.values - d2.values, d1.index, ['x1-x2'])
10 loops, best of 3: 19.9 ms per loop

In [189]: %timeit d1 - d2.values
100 loops, best of 3: 14 ms per loop

In [190]: %timeit d1.reset_index(drop=True) - d2.reset_index(drop=True)
1 loop, best of 3: 242 ms per loop

In [191]: %timeit d1.reset_index(drop=True).sub(d2.reset_index(drop=True))
1 loop, best of 3: 242 ms per loop
like image 159
MaxU - stop WAR against UA Avatar answered Nov 06 '22 10:11

MaxU - stop WAR against UA


dsub = pd.DataFrame(d1.values - d2.values, d1.index, ['x1-x2'])

dsub

enter image description here

like image 41
piRSquared Avatar answered Nov 06 '22 12:11

piRSquared