Is there a way I could use a scipy function like norm.cdf
in place on a numpy.array
(or pandas.DataFrame
), using a variant of numpy.apply
, numpy.apply_along_axs
, etc?
The background is, I have a table of z-score values that I would like to convert to CDF values of the norm distribution. I'm currently using norm.cdf
from scipy
for this.
I'm currently manipulating a dataframe that has non-numeric values.
Name Val1 Val2 Val3 Val4
0 A -1.540369 -0.077779 0.979606 -0.667112
1 B -0.787154 0.048412 0.775444 -0.510904
2 C -0.477234 0.414388 1.250544 -0.411658
3 D -1.430851 0.258759 1.247752 -0.883293
4 E -0.360181 0.485465 1.123589 -0.379157
(Making the Name
variable an index is a solution, but in my actual dataset, the names are not alphabetical characters.)
To modify only the numeric data, I'm using df._get_numeric_data()
a private function that returns a dataframe containing a dataframe's numeric data. However, there is no set
function. Hence, if I call
norm.cdf(df._get_numeric_data)
this won't change df
's original data.
I'm trying to circumvent this by applying norm.cdf
to the numeric dataframe inplace, so this changes my original dataset.
Pandas DataFrame apply() Method The apply() method allows you to apply a function along one of the axis of the DataFrame, default 0, which is the index (row) axis.
No, the apply() method doesn't contain an inplace parameter, unlike these pandas methods which have an inplace parameter: df.
Use apply() function when you wanted to update every row in pandas DataFrame by calling a custom function. In order to apply a function to every row, you should use axis=1 param to apply(). By applying a function to each row, we can create a new column by using the values from the row, updating the row e.t.c.
I think I would prefer select_dtypes
over _get_numeric_data
:
In [11]: df.select_dtypes(include=[np.number])
Out[11]:
Val1 Val2 Val3 Val4
0 -1.540369 -0.077779 0.979606 -0.667112
1 -0.787154 0.048412 0.775444 -0.510904
2 -0.477234 0.414388 1.250544 -0.411658
3 -1.430851 0.258759 1.247752 -0.883293
4 -0.360181 0.485465 1.123589 -0.379157
Although apply doesn't offer an inplace, you could do something like the following (which I would argue was more explicit anyway):
num_df = df.select_dtypes(include=[np.number])
df[num_df.columns] = norm.cdf(num_df.values)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With