I understand how to take cubic root of both positive and negative numbers. But when trying to use apply
-lambda
method to efficiently process all elements of a dataframe, I run into an ambiguity issue. Interestingly, this error does not arise with equalities, so I am wondering what could be wrong with the code:
sample[columns]=sample[columns].apply(lambda x: (-1)*np.power(-x,1./3) if x<0 else np.power(x,1./3))
It looks like you are passing a list or array of column names. I assume this because your variable name is plural with an s
at the end. If this is the case, then sample[columns]
is a dataframe. This is an issue because apply
iterates through each column, passing that column the lambda
you passed to apply
. So you get
(-1) * np.power(-series_object, -1./3) if series_object < 0 else...
And it's the series_object < 0
that is messing things up because you are asking for the truthiness of a whole series being less than zero.
applymap
f = lambda x: -np.power(-x, 1./3) if x < 0 else np.power(x, 1./3)
sample[columns] = sample[columns].applymap(f)
That said, I'd use a lambda
defined as follows
f = lambda x: np.sign(x) * np.power(abs(x), 1./3)
Then you could perform this on the entire dataframe
np.random.seed([3,1415])
df = pd.DataFrame(np.random.randint(-10, 10, (5, 5)))
df
0 1 2 3 4
0 6 1 -8 0 5
1 3 1 3 9 -2
2 -10 2 -10 -8 -10
3 -3 9 3 8 2
4 -6 -7 9 3 -3
f = lambda x: np.sign(x) * np.power(abs(x), 1./3)
f(df)
0 1 2 3 4
0 1.817121 1.000000 -2.000000 0.000000 1.709976
1 1.442250 1.000000 1.442250 2.080084 -1.259921
2 -2.154435 1.259921 -2.154435 -2.000000 -2.154435
3 -1.442250 2.080084 1.442250 2.000000 1.259921
4 -1.817121 -1.912931 2.080084 1.442250 -1.442250
Same as
df.applymap(f)
0 1 2 3 4
0 1.817121 1.000000 -2.000000 0.000000 1.709976
1 1.442250 1.000000 1.442250 2.080084 -1.259921
2 -2.154435 1.259921 -2.154435 -2.000000 -2.154435
3 -1.442250 2.080084 1.442250 2.000000 1.259921
4 -1.817121 -1.912931 2.080084 1.442250 -1.442250
Check for equality
df.applymap(f).equals(f(df))
True
And its faster
%timeit df.applymap(f)
%timeit f(df)
1000 loops, best of 3: 1.11 ms per loop
1000 loops, best of 3: 473 µs per loop
It doesn't have to be complicated, simply use NumPys cube-root function: np.cbrt
:
df[columns] = np.cbrt(df[columns])
It requires NumPy >= 1.10
though.
For older versions you could use np.absolute
and np.sign
instead of using conditionals:
df[columns] = df[columns].apply(lambda x: np.power(np.absolute(x), 1./3) * np.sign(x))
This calculates the cube root of the absolute and then changes the sign appropriatly.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With