Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

mean of all the columns of a panda dataframe?

I'm trying to calculate the mean of all the columns of a DataFrame but it looks like having a value in the B column of row 6 prevents from calculating the mean on the C column. Why?

import pandas as pd
from decimal import Decimal
d = [
    {'A': 2, 'B': None, 'C': Decimal('628.00')},
    {'A': 1, 'B': None, 'C': Decimal('383.00')},
    {'A': 3, 'B': None, 'C': Decimal('651.00')},
    {'A': 2, 'B': None, 'C': Decimal('575.00')},
    {'A': 4, 'B': None, 'C': Decimal('1114.00')},
    {'A': 1, 'B': 'TEST', 'C': Decimal('241.00')},
    {'A': 2, 'B': None, 'C': Decimal('572.00')},
    {'A': 4, 'B': None, 'C': Decimal('609.00')},
    {'A': 3, 'B': None, 'C': Decimal('820.00')},
    {'A': 5, 'B': None, 'C': Decimal('1223.00')}
]

df = pd.DataFrame(d)

In : df
Out:
   A     B        C
0  2  None   628.00
1  1  None   383.00
2  3  None   651.00
3  2  None   575.00
4  4  None  1114.00
5  1  TEST   241.00
6  2  None   572.00
7  4  None   609.00
8  3  None   820.00
9  5  None  1223.00

Tests:

# no mean for C column
In : df.mean()
Out:
A    2.7
dtype: float64

# mean for C column when row 6 is left out of the DF
In : df.head(5).mean()
Out:
A      2.4
B      NaN
C    670.2
dtype: float64

# no mean for C column when row 6 is part of the DF
In : df.head(6).mean()
Out:
A    2.166667
dtype: float64

dtypes:

In : df.dtypes
Out:
A     int64
B    object
C    object
dtype: object

In : df.head(5).dtypes
Out:
A     int64
B    object
C    object
dtype: object
like image 919
Michael Avatar asked Nov 20 '15 20:11

Michael


People also ask

How do you find the mean of a panda?

The mean() function is used to return the mean of the values for the requested axis. If we apply this method on a Series object, then it returns a scalar value, which is the mean value of all the observations in the dataframe.

Does mean () include NaN?

pandas mean() Key PointsBy default ignore NaN values and performs mean on index axis.

Does pandas have a mean function?

Pandas DataFrame mean() Method The mean() method returns a Series with the mean value of each column. Mean, Median, and Mode: Mean - The average value.


1 Answers

You could use particular columns if you need only columns with numbers:

In [90]: df[['A','C']].mean()
Out[90]: 
A      2.7
C    681.6
dtype: float64

or to change type as @jezrael advice in comment:

df['C'] = df['C'].astype(float)

Probably df.mean trying to convert all object to numeric and if it's fall then it's roll back and calculate only for actual numbers

like image 76
Anton Protopopov Avatar answered Oct 20 '22 19:10

Anton Protopopov