Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Calculate dataframe mean by skipping certain values in Python / Pandas

I need to calculate the mean of the first column of the dataframe and I can do that using the mean() method. The problem: Sometimes, there are -9999 values in the data denoting missing observations. I know that NaN values are inherently skipped when calculating the mean in Pandas, but this is not the case with -9999 values of course.

Here is the code I tried. It calculates the mean of the column, but by taking the -9999 value into the calculations:

df=pandas.DataFrame([{2,4,6},{1,-9999,3}])
df[0].mean(skipna=-9999)

but it yields a mean value of -4998.5 which obviously is produced taking the -9999 into the calculations.

like image 753
multigoodverse Avatar asked Mar 17 '23 06:03

multigoodverse


2 Answers

The skipna arg is a boolean specifying whether or not to exclude NA/null values, not which values to ignore:

skipna : boolean, default True
    Exclude NA/null values. If an entire row/column is NA, the result
    will be NA

Assuming I understand what you're trying to do, you could replace -9999 by NaN:

In [41]: df[0].replace(-9999, np.nan)
Out[41]: 
0     2
1   NaN
Name: 0, dtype: float64

In [42]: df[0].replace(-9999, np.nan).mean()
Out[42]: 2.0
like image 89
DSM Avatar answered Apr 25 '23 14:04

DSM


skipna is a meant to be true or false, not a value to be skipped.

when reading your data, normalize, and replace -9999 with n/a.

like image 38
mnagel Avatar answered Apr 25 '23 13:04

mnagel