Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to apply function to dataframe in place

Is there a way I could use a scipy function like norm.cdf in place on a numpy.array (or pandas.DataFrame), using a variant of numpy.apply, numpy.apply_along_axs, etc?


The background is, I have a table of z-score values that I would like to convert to CDF values of the norm distribution. I'm currently using norm.cdf from scipy for this.

I'm currently manipulating a dataframe that has non-numeric values.

      Name      Val1      Val2      Val3      Val4 
0        A -1.540369 -0.077779  0.979606 -0.667112   
1        B -0.787154  0.048412  0.775444 -0.510904   
2        C -0.477234  0.414388  1.250544 -0.411658   
3        D -1.430851  0.258759  1.247752 -0.883293   
4        E -0.360181  0.485465  1.123589 -0.379157

(Making the Name variable an index is a solution, but in my actual dataset, the names are not alphabetical characters.)

To modify only the numeric data, I'm using df._get_numeric_data() a private function that returns a dataframe containing a dataframe's numeric data. However, there is no set function. Hence, if I call

norm.cdf(df._get_numeric_data)

this won't change df's original data.

I'm trying to circumvent this by applying norm.cdf to the numeric dataframe inplace, so this changes my original dataset.

like image 806
hlin117 Avatar asked Feb 22 '15 18:02

hlin117


People also ask

How do I apply a function in Pandas?

Pandas DataFrame apply() Method The apply() method allows you to apply a function along one of the axis of the DataFrame, default 0, which is the index (row) axis.

Does Pandas apply work in place?

No, the apply() method doesn't contain an inplace parameter, unlike these pandas methods which have an inplace parameter: df.

How do I apply a function to each row in a DataFrame?

Use apply() function when you wanted to update every row in pandas DataFrame by calling a custom function. In order to apply a function to every row, you should use axis=1 param to apply(). By applying a function to each row, we can create a new column by using the values from the row, updating the row e.t.c.


1 Answers

I think I would prefer select_dtypes over _get_numeric_data:

In [11]: df.select_dtypes(include=[np.number])
Out[11]:
       Val1      Val2      Val3      Val4
0 -1.540369 -0.077779  0.979606 -0.667112
1 -0.787154  0.048412  0.775444 -0.510904
2 -0.477234  0.414388  1.250544 -0.411658
3 -1.430851  0.258759  1.247752 -0.883293
4 -0.360181  0.485465  1.123589 -0.379157

Although apply doesn't offer an inplace, you could do something like the following (which I would argue was more explicit anyway):

num_df = df.select_dtypes(include=[np.number])
df[num_df.columns] = norm.cdf(num_df.values)
like image 56
Andy Hayden Avatar answered Sep 19 '22 13:09

Andy Hayden