What is the best way to account for (not a number) nan values in a pandas DataFrame?
The following code:
import numpy as np import pandas as pd dfd = pd.DataFrame([1, np.nan, 3, 3, 3, np.nan], columns=['a']) dfv = dfd.a.value_counts().sort_index() print("nan: %d" % dfv[np.nan].sum()) print("1: %d" % dfv[1].sum()) print("3: %d" % dfv[3].sum()) print("total: %d" % dfv[:].sum())
Outputs:
nan: 0 1: 1 3: 3 total: 4
While the desired output is:
nan: 2 1: 1 3: 3 total: 6
I am using pandas 0.17 with Python 3.5.0 with Anaconda 2.4.0.
To count the NaN values in a column in a Pandas DataFrame, we can use the isna() method with sum.
Count non-NA cells for each column or row. The values None , NaN , NaT , and optionally numpy.
We can use Pandas' sum() function to get the counts of missing values per each column in the dataframe. By default, Pandas sum() adds across columns. And we get a dataframe with number of missing values for each column.
To count just null values, you can use isnull()
:
In [11]: dfd.isnull().sum() Out[11]: a 2 dtype: int64
Here a
is the column name, and there are 2 occurrences of the null value in the column.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With