How to apply function to dataframe in place

Tags:

Is there a way I could use a scipy function like norm.cdf in place on a numpy.array (or pandas.DataFrame), using a variant of numpy.apply, numpy.apply_along_axs, etc?

The background is, I have a table of z-score values that I would like to convert to CDF values of the norm distribution. I'm currently using norm.cdf from scipy for this.

I'm currently manipulating a dataframe that has non-numeric values.

      Name      Val1      Val2      Val3      Val4 
0        A -1.540369 -0.077779  0.979606 -0.667112   
1        B -0.787154  0.048412  0.775444 -0.510904   
2        C -0.477234  0.414388  1.250544 -0.411658   
3        D -1.430851  0.258759  1.247752 -0.883293   
4        E -0.360181  0.485465  1.123589 -0.379157

(Making the Name variable an index is a solution, but in my actual dataset, the names are not alphabetical characters.)

To modify only the numeric data, I'm using df._get_numeric_data() a private function that returns a dataframe containing a dataframe's numeric data. However, there is no set function. Hence, if I call

norm.cdf(df._get_numeric_data)

this won't change df's original data.

I'm trying to circumvent this by applying norm.cdf to the numeric dataframe inplace, so this changes my original dataset.

806

asked Feb 22 '15 18:02

hlin117

1 Answers

I think I would prefer select_dtypes over _get_numeric_data:

In [11]: df.select_dtypes(include=[np.number])
Out[11]:
       Val1      Val2      Val3      Val4
0 -1.540369 -0.077779  0.979606 -0.667112
1 -0.787154  0.048412  0.775444 -0.510904
2 -0.477234  0.414388  1.250544 -0.411658
3 -1.430851  0.258759  1.247752 -0.883293
4 -0.360181  0.485465  1.123589 -0.379157

Although apply doesn't offer an inplace, you could do something like the following (which I would argue was more explicit anyway):

num_df = df.select_dtypes(include=[np.number])
df[num_df.columns] = norm.cdf(num_df.values)

answered Sep 19 '22 13:09

Andy Hayden

Related questions
                            
                                celery worker does not retry task after calling retry()
                            
                                Scapy how get ping time?
                            
                                naming a file when downloading with Selenium Webdriver
                            
                                image/video processing options
                            
                                itertools.islice implementation -- efficiently slicing a list
                            
                                Python multiprocessing Queues reliability, Queue vs SimpleQueue vs JoinableQueue
                            
                                SciPy/NumPy import guideline
                            
                                Python 2.7 argparse: How to nest optional mutally exclusive arguments properly?
                            
                                Python: uWSGI configuration for NGINX+FLASK
                            
                                sqlalchemy, postgresql and relationship stuck in "idle in transaction"
                            
                                monte carlo simulation of protein structure and grid
                            
                                Finding which packages support Python 3.x vs 2.7.x
                            
                                Python cannot import module from subdirectory even with a file named __init.py__ in the directory
                            
                                Python - delete blank lines of text at the end of the file
                            
                                How to find the breakpoint numbers in pdb (ipdb)?
                            
                                Use Python to send keystrokes to games in Windows?
                            
                                SQLAlchemy session error: InvalidRequestError
                            
                                Match rows in one Pandas dataframe to another based on three columns
                            
                                How to document Python packages using Sphinx
                            
                                Using multiprocessing pool from celery task raises exception

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to apply function to dataframe in place

Tags:

python

pandas

vectorization

scipy

hlin117

People also ask

1 Answers

Andy Hayden

Recent Activity

Donate For Us