Python - Calculating standard deviation (row level) of dataframe columns

Tags:

I have created a Pandas Dataframe and am able to determine the standard deviation of one or more columns of this dataframe (column level). I need to determine the standard deviation for all the rows of a particular column. Below are the commands that I have tried so far

# Will determine the standard deviation of all the numerical columns by default.
inp_df.std()

salary         8.194421e-01
num_months     3.690081e+05
no_of_hours    2.518869e+02

# Same as above command. Performs the standard deviation at the column level.
inp_df.std(axis = 0)

# Determines the standard deviation over only the salary column of the dataframe.
inp_df[['salary']].std()

salary         8.194421e-01

# Determines Standard Deviation for every row present in the dataframe. But it
# does this for the entire row and it will output values in a single column.
# One std value for each row.
inp_df.std(axis=1)

0       4.374107e+12
1       4.377543e+12
2       4.374026e+12
3       4.374046e+12
4       4.374112e+12
5       4.373926e+12

When I execute the below command I am getting "NaN" for all the records. Is there a way to resolve this?

# Trying to determine standard deviation only for the "salary" column at the
# row level.
inp_df[['salary']].std(axis = 1)

0      NaN
1      NaN
2      NaN
3      NaN
4      NaN

345

asked Dec 17 '18 05:12

JKC

1 Answers

It is expected, because if checking DataFrame.std:

Normalized by N-1 by default. This can be changed using the ddof argument

If you have one element, you're doing a division by 0. So if you have one column and want the sample standard deviation over columns, get all the missing values.

Sample:

inp_df = pd.DataFrame({'salary':[10,20,30],
                       'num_months':[1,2,3],
                       'no_of_hours':[2,5,6]})
print (inp_df)
   salary  num_months  no_of_hours
0      10           1            2
1      20           2            5
2      30           3            6

Select one column by one [] for Series:

print (inp_df['salary'])
0    10
1    20
2    30
Name: salary, dtype: int64

Get std of Series - get a scalar:

print (inp_df['salary'].std())
10.0

Select one column by double [] for one column DataFrame:

print (inp_df[['salary']])
   salary
0      10
1      20
2      30

Get std of DataFrame per index (default value) - get one element Series:

print (inp_df[['salary']].std())
#same like
#print (inp_df[['salary']].std(axis=0))
salary    10.0
dtype: float64

Get std of DataFrame per columns (axis=1) - get all NaNs:

print (inp_df[['salary']].std(axis = 1))
0   NaN
1   NaN
2   NaN
dtype: float64

If changed default ddof=1 to ddof=0:

print (inp_df[['salary']].std(axis = 1, ddof=0))
0    0.0
1    0.0
2    0.0
dtype: float64

If you want std by two or more columns:

#select 2 columns
print (inp_df[['salary', 'num_months']])
   salary  num_months
0      10           1
1      20           2
2      30           3

#std by index
print (inp_df[['salary','num_months']].std())
salary        10.0
num_months     1.0
dtype: float64

#std by columns
print (inp_df[['salary','no_of_hours']].std(axis = 1))
0     5.656854
1    10.606602
2    16.970563
dtype: float64

186

answered Sep 30 '22 19:09

jezrael

Related questions
                            
                                Add multiple csv in a single csv sheet in tabs using Pandas
                            
                                Guaranteeing calling to destruction on process termination
                            
                                ThreadPoolExecutor, ProcessPoolExecutor and global variables
                            
                                Python files to an MSI Windows installer
                            
                                libmariadbclient-dev install error: Depends: libmariadbclient18
                            
                                Hash each row of pandas dataframe column using apply
                            
                                Keras dot/Dot layer behavior on 3D tensors
                            
                                Cancel a Drag & Drop for some specific items in a Gtk.TreeView
                            
                                sklearn module not found in anaconda
                            
                                Get a clean string from HTML, CSS and JavaScript
                            
                                input() call where text is typed at custom position in the string
                            
                                Can pythons lambda be used to change the inner working of another function?
                            
                                How to read json file containing ObjectId and ISODate in Python?
                            
                                Run two async functions without blocking each other
                            
                                Using BeautifulSoup 4 and recursion to capture the structure of HTML nested tags
                            
                                invalid syntax cause by += in ternany
                            
                                Capturing game screenshots for use by a Python script
                            
                                Insert pandas dataframe created within Python into SQL Server
                            
                                Why is __setitem__ much faster than an equivalent "normal" method for cdef-classes?
                            
                                What is pickle doing?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Python - Calculating standard deviation (row level) of dataframe columns

Tags:

python-3.x

pandas

JKC

People also ask

1 Answers

jezrael

Recent Activity

Donate For Us