df = pd.DataFrame({'A' : ['foo', 'bar', 'foo', 'bar', 'foo', 'bar', 'foo', 'foo'], 'B' : ['one', 'one', 'two', 'three', 'two', 'two', 'one', 'three'], 'C' : [np.nan, 'bla2', np.nan, 'bla3', np.nan, np.nan, np.nan, np.nan]})
Output:
A B C 0 foo one NaN 1 bar one bla2 2 foo two NaN 3 bar three bla3 4 foo two NaN 5 bar two NaN 6 foo one NaN 7 foo three NaN
I would like to use groupby in order to count the number of NaN's for the different combinations of foo.
Expected Output (EDIT):
A B C D 0 foo one NaN 2 1 bar one bla2 0 2 foo two NaN 2 3 bar three bla3 0 4 foo two NaN 2 5 bar two NaN 1 6 foo one NaN 2 7 foo three NaN 1
Currently I am trying this:
df['count']=df.groupby(['A'])['B'].isnull().transform('sum')
But this is not working...
Thank You
Use count() by Column Name Use pandas DataFrame. groupby() to group the rows by column and use count() method to get the count for each group by ignoring None and Nan values. It works with non-floating type data as well.
Count non-NA cells for each column or row. The values None , NaN , NaT , and optionally numpy.
What is the GroupBy function? Pandas' GroupBy is a powerful and versatile function in Python. It allows you to split your data into separate groups to perform computations for better analysis.
I think you need groupby
with sum
of NaN
values:
df2 = df.C.isnull().groupby([df['A'],df['B']]).sum().astype(int).reset_index(name='count') print(df2) A B count 0 bar one 0 1 bar three 0 2 bar two 1 3 foo one 2 4 foo three 1 5 foo two 2
If need filter first add boolean indexing
:
df = df[df['A'] == 'foo'] df2 = df.C.isnull().groupby([df['A'],df['B']]).sum().astype(int) print(df2) A B foo one 2 three 1 two 2
Or simpler:
df = df[df['A'] == 'foo'] df2 = df['B'].value_counts() print(df2) one 2 two 2 three 1 Name: B, dtype: int64
EDIT: Solution is very similar, only add transform
:
df['D'] = df.C.isnull().groupby([df['A'],df['B']]).transform('sum').astype(int) print(df) A B C D 0 foo one NaN 2 1 bar one bla2 0 2 foo two NaN 2 3 bar three bla3 0 4 foo two NaN 2 5 bar two NaN 1 6 foo one NaN 2 7 foo three NaN 1
Similar solution:
df['D'] = df.C.isnull() df['D'] = df.groupby(['A','B'])['D'].transform('sum').astype(int) print(df) A B C D 0 foo one NaN 2 1 bar one bla2 0 2 foo two NaN 2 3 bar three bla3 0 4 foo two NaN 2 5 bar two NaN 1 6 foo one NaN 2 7 foo three NaN 1
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With