Can I use pandas pivot_table to aggregate over a column with missing values and have those missing values included as separate category?
In:
df = pd.DataFrame({'a': pd.Series(['X', 'X', 'Y', 'Y', 'N', 'N'], dtype='category'),
'b': pd.Series([None, None, 'd', 'd', 'd', 'd'], dtype='category')})
Out:
a b
0 X NaN
1 X NaN
2 Y d
3 Y d
4 N d
5 N d
In:
df.groupby('a')['b'].apply(lambda x: x.value_counts(dropna=False)).unstack(1)
Out:
NaN d
a
N NaN 2.0
X 2.0 0.0
Y NaN 2.0
Can I achieve the same result using pandas pivot_table? If yes than how? Thanks.
Checking for missing values using isnull() and notnull() In order to check missing values in Pandas DataFrame, we use a function isnull() and notnull(). Both function help in checking whether a value is NaN or not. These function can also be used in Pandas Series in order to find null values in a series.
When applied to a dataframe, Pandas isna() function return boolean dataframe with True with the element is missing value and False when it is not a missing value. We can use Pandas' sum() function to get the counts of missing values per each column in the dataframe. By default, Pandas sum() adds across columns.
Pivot table in pandas is an excellent tool to summarize one or more numeric variable based on two other categorical variables. Pivot tables in pandas are popularly seen in MS Excel files. In python, Pivot tables of pandas dataframes can be created using the command: pandas. pivot_table .
For some unknown reason, dtype="category"
does not work with pivot_table()
when counting NaN
values. Casting them to regular strings enables regular pivot_table(aggfunc="size")
.
df.astype(str).pivot_table(index="a", columns="b", aggfunc="size")
b d nan
a
N 2.0 NaN
X NaN 2.0
Y 2.0 NaN
One can optionally do .fillna(0)
to replace nan
s with 0
s
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With