Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas extensive 'describe' include count the null values

Tags:

python

pandas

I have a large data frame composed of 450 columns with 550 000 rows. In the columns i have :

  • 73 float columns
  • 30 columns dates
  • remainder columns in object

I would like to make a description of my variables, but not only describe as usual, but also include other descriptions in the same matrix. At the final, we will have a matrix of description with the set of 450 variables then a detailed description of: - dtype - count - count null values - % number of null values - max - min - 50% - 75% - 25% - ......

For now, i have juste a basic function that describe my data like this :

Dataframe.describe(include = 'all')

Do you have a function or method to do this more extensive descrition.

Thanks.

like image 698
Ib D Avatar asked Nov 06 '18 14:11

Ib D


1 Answers

You need write custom functions for Series and then add to final describe DataFrame:

Notice:

First row of final df is count - used function count for count non NaNs values

df = pd.DataFrame({
        'A':list('abcdef'),
         'B':[4,np.nan,np.nan,5,5,4],
         'C':[7,8,9,4,2,3],
         'D':[1,3,5,7,1,0],
         'E':[5,3,6,9,2,4],
         'F':list('aaabbb')
})

print (df)
   A    B  C  D  E  F
0  a  4.0  7  1  5  a
1  b  NaN  8  3  3  a
2  c  NaN  9  5  6  a
3  d  5.0  4  7  9  b
4  e  5.0  2  1  2  b
5  f  4.0  3  0  4  b

df1 = df.describe(include = 'all')

df1.loc['dtype'] = df.dtypes
df1.loc['size'] = len(df)
df1.loc['% count'] = df.isnull().mean()

print (df1)
              A         B        C        D        E       F
count         6         4        6        6        6       6
unique        6       NaN      NaN      NaN      NaN       2
top           e       NaN      NaN      NaN      NaN       b
freq          1       NaN      NaN      NaN      NaN       3
mean        NaN       4.5      5.5  2.83333  4.83333     NaN
std         NaN   0.57735  2.88097  2.71416  2.48328     NaN
min         NaN         4        2        0        2     NaN
25%         NaN         4     3.25        1     3.25     NaN
50%         NaN       4.5      5.5        2      4.5     NaN
75%         NaN         5     7.75      4.5     5.75     NaN
max         NaN         5        9        7        9     NaN
dtype    object   float64    int64    int64    int64  object
size          6         6        6        6        6       6
% count       0  0.333333        0        0        0       0
like image 112
jezrael Avatar answered Oct 13 '22 12:10

jezrael