Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How I can calculate standard deviation for rows of a dataframe?

df:  

name   group   S1   S2  S3        
A      mn      1    2   8         
B      mn      4    3   5        
C      kl      5    8   2        
D      kl      6    5   5         
E      fh      7    1   3         

output: 

std (S1,S2,S3)
3.78
1
3
0.57
3.05

This is working for getting std for a column:

numpy.std(df['A'])

I want to do the same for rows

like image 382
NamAshena Avatar asked Jul 13 '16 20:07

NamAshena


People also ask

How do you find standard deviation from a data frame?

Standard deviation is calculated using the function . std() . However, the Pandas library creates the Dataframe object and then the function . std() is applied on that Dataframe .

How do you calculate rows in a data frame?

Get Number of Rows in DataFrame You can use len(df. index) to find the number of rows in pandas DataFrame, df. index returns RangeIndex(start=0, stop=8, step=1) and use it on len() to get the count.

What is std () in pandas?

std() The Pandas std() is defined as a function for calculating the standard deviation of the given set of numbers, DataFrame, column, and rows. In respect to calculate the standard deviation, we need to import the package named "statistics" for the calculation of median.

How do you find the standard deviation of a data set in Python?

stdev() method calculates the standard deviation from a sample of data. Standard deviation is a measure of how spread out the numbers are. A large standard deviation indicates that the data is spread out, - a small standard deviation indicates that the data is clustered closely around the mean.


1 Answers

You can use DataFrame.std, which omit non numeric columns:

print (df.std())
S1    2.302173
S2    2.774887
S3    2.302173
dtype: float64

If need std by columns:

print (df.std(axis=1))
0    3.785939
1    1.000000
2    3.000000
3    0.577350
4    3.055050
dtype: float64

If need select only some numeric columns, use subset:

print (df[['S1','S2']].std())
S1    2.302173
S2    2.774887
dtype: float64

There is different with numpy.std by default parameter ddof (Delta Degrees of Freedom):

  • pandas by default ddof=1
  • numpy by default ddof=0

So there are different outputs:

#ddof=1
print (df.std(axis=1))
0    3.785939
1    1.000000
2    3.000000
3    0.577350
4    3.055050
dtype: float64

#ddof=0
print (np.std(df, axis=1))
0    3.091206
1    0.816497
2    2.449490
3    0.471405
4    2.494438
dtype: float64

But you can change it very easy:

#same output as pandas function
print (np.std(df, ddof=1, axis=1))
0    3.785939
1    1.000000
2    3.000000
3    0.577350
4    3.055050
dtype: float64

#same output as numpy function
print (df.std(ddof=0, axis=1))
0    3.091206
1    0.816497
2    2.449490
3    0.471405
4    2.494438
dtype: float64   
like image 73
jezrael Avatar answered Nov 07 '22 17:11

jezrael