Pandas - Compute z-score for all columns

Tags:

I have a dataframe containing a single column of IDs and all other columns are numerical values for which I want to compute z-scores. Here's a subsection of it:

ID      Age    BMI    Risk Factor PT 6    48     19.3    4 PT 8    43     20.9    NaN PT 2    39     18.1    3 PT 9    41     19.5    NaN

Some of my columns contain NaN values which I do not want to include into the z-score calculations so I intend to use a solution offered to this question: how to zscore normalize pandas column with nans?

df['zscore'] = (df.a - df.a.mean())/df.a.std(ddof=0)

I'm interested in applying this solution to all of my columns except the ID column to produce a new dataframe which I can save as an Excel file using

df2.to_excel("Z-Scores.xlsx")

So basically; how can I compute z-scores for each column (ignoring NaN values) and push everything into a new dataframe?

SIDENOTE: there is a concept in pandas called "indexing" which intimidates me because I do not understand it well. If indexing is a crucial part of solving this problem, please dumb down your explanation of indexing.

872

asked Jul 15 '14 15:07

Slavatron

Video Answer

1 Answers

Build a list from the columns and remove the column you don't want to calculate the Z score for:

In [66]: cols = list(df.columns) cols.remove('ID') df[cols]  Out[66]:    Age  BMI  Risk  Factor 0    6   48  19.3       4 1    8   43  20.9     NaN 2    2   39  18.1       3 3    9   41  19.5     NaN In [68]: # now iterate over the remaining columns and create a new zscore column for col in cols:     col_zscore = col + '_zscore'     df[col_zscore] = (df[col] - df[col].mean())/df[col].std(ddof=0) df Out[68]:    ID  Age  BMI  Risk  Factor  Age_zscore  BMI_zscore  Risk_zscore  \ 0  PT    6   48  19.3       4   -0.093250    1.569614    -0.150946    1  PT    8   43  20.9     NaN    0.652753    0.074744     1.459148    2  PT    2   39  18.1       3   -1.585258   -1.121153    -1.358517    3  PT    9   41  19.5     NaN    1.025755   -0.523205     0.050315        Factor_zscore   0              1   1            NaN   2             -1   3            NaN

161

answered Sep 22 '22 06:09

EdChum

Related questions
                            
                                Diff of two Dataframes
                            
                                Equivalent of Numpy.argsort() in basic python? [duplicate]
                            
                                How to specify where a Tkinter window opens?
                            
                                Python Pandas Histogram Log Scale
                            
                                Read csv from Google Cloud storage to pandas dataframe
                            
                                What gui library is used by sublime text editor?
                            
                                PyCharm import external library
                            
                                Installing PyCrypto on Ubuntu - fatal error on build
                            
                                Django csrf token + Angularjs
                            
                                Converting Pandas dataframe into Spark dataframe error
                            
                                What does `<>` mean in Python?
                            
                                What is an intuitive explanation of np.unravel_index?
                            
                                How to zip two 1d numpy array to 2d numpy array [duplicate]
                            
                                How to specify date format when using pandas.to_csv?
                            
                                Python global exception handling
                            
                                How to change the message in a Python AssertionError?
                            
                                How to run a method before all tests in all classes?
                            
                                Why PyPi doesn't show download stats anymore? [closed]
                            
                                Matplotlib-Animation "No MovieWriters Available"
                            
                                "E: Unable to locate package python-pip" on Ubuntu 18.04 [duplicate]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Pandas - Compute z-score for all columns

Tags:

python

indexing

pandas

statistics

Slavatron

People also ask

Video Answer

1 Answers

EdChum

Recent Activity

Donate For Us