I need to calculate the mean of the first column of the dataframe and I can do that using the mean()
method.
The problem: Sometimes, there are -9999 values in the data denoting missing observations.
I know that NaN values are inherently skipped when calculating the mean in Pandas, but this is not the case with -9999 values of course.
Here is the code I tried. It calculates the mean of the column, but by taking the -9999 value into the calculations:
df=pandas.DataFrame([{2,4,6},{1,-9999,3}])
df[0].mean(skipna=-9999)
but it yields a mean value of -4998.5 which obviously is produced taking the -9999 into the calculations.
The skipna
arg is a boolean specifying whether or not to exclude NA/null values, not which values to ignore:
skipna : boolean, default True
Exclude NA/null values. If an entire row/column is NA, the result
will be NA
Assuming I understand what you're trying to do, you could replace -9999
by NaN
:
In [41]: df[0].replace(-9999, np.nan)
Out[41]:
0 2
1 NaN
Name: 0, dtype: float64
In [42]: df[0].replace(-9999, np.nan).mean()
Out[42]: 2.0
skipna
is a meant to be true or false, not a value to be skipped.
when reading your data, normalize, and replace -9999 with n/a.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With