Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Cubic Root of Pandas DataFrame

I understand how to take cubic root of both positive and negative numbers. But when trying to use apply-lambda method to efficiently process all elements of a dataframe, I run into an ambiguity issue. Interestingly, this error does not arise with equalities, so I am wondering what could be wrong with the code:

sample[columns]=sample[columns].apply(lambda x: (-1)*np.power(-x,1./3) if x<0 else np.power(x,1./3))
like image 960
mamafoku Avatar asked Mar 09 '23 21:03

mamafoku


2 Answers

It looks like you are passing a list or array of column names. I assume this because your variable name is plural with an s at the end. If this is the case, then sample[columns] is a dataframe. This is an issue because apply iterates through each column, passing that column the lambda you passed to apply. So you get

(-1) * np.power(-series_object, -1./3) if series_object < 0 else...

And it's the series_object < 0 that is messing things up because you are asking for the truthiness of a whole series being less than zero.


applymap

f = lambda x: -np.power(-x, 1./3) if x < 0 else np.power(x, 1./3)
sample[columns] = sample[columns].applymap(f)

That said, I'd use a lambda defined as follows

f = lambda x: np.sign(x) * np.power(abs(x), 1./3)

Then you could perform this on the entire dataframe

np.random.seed([3,1415])
df = pd.DataFrame(np.random.randint(-10, 10, (5, 5)))

df

    0  1   2  3   4
0   6  1  -8  0   5
1   3  1   3  9  -2
2 -10  2 -10 -8 -10
3  -3  9   3  8   2
4  -6 -7   9  3  -3

f = lambda x: np.sign(x) * np.power(abs(x), 1./3)
f(df)

          0         1         2         3         4
0  1.817121  1.000000 -2.000000  0.000000  1.709976
1  1.442250  1.000000  1.442250  2.080084 -1.259921
2 -2.154435  1.259921 -2.154435 -2.000000 -2.154435
3 -1.442250  2.080084  1.442250  2.000000  1.259921
4 -1.817121 -1.912931  2.080084  1.442250 -1.442250

Same as

df.applymap(f)

          0         1         2         3         4
0  1.817121  1.000000 -2.000000  0.000000  1.709976
1  1.442250  1.000000  1.442250  2.080084 -1.259921
2 -2.154435  1.259921 -2.154435 -2.000000 -2.154435
3 -1.442250  2.080084  1.442250  2.000000  1.259921
4 -1.817121 -1.912931  2.080084  1.442250 -1.442250

Check for equality

df.applymap(f).equals(f(df))

True

And its faster

%timeit df.applymap(f)
%timeit f(df)

1000 loops, best of 3: 1.11 ms per loop
1000 loops, best of 3: 473 µs per loop
like image 60
piRSquared Avatar answered Mar 20 '23 13:03

piRSquared


It doesn't have to be complicated, simply use NumPys cube-root function: np.cbrt:

df[columns] = np.cbrt(df[columns])

It requires NumPy >= 1.10 though.


For older versions you could use np.absolute and np.sign instead of using conditionals:

df[columns] = df[columns].apply(lambda x: np.power(np.absolute(x), 1./3) * np.sign(x))

This calculates the cube root of the absolute and then changes the sign appropriatly.

like image 24
MSeifert Avatar answered Mar 20 '23 15:03

MSeifert