Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas count null values in a groupby function

Tags:

df = pd.DataFrame({'A' : ['foo', 'bar', 'foo', 'bar', 'foo', 'bar', 'foo', 'foo'],                'B' : ['one', 'one', 'two', 'three', 'two', 'two', 'one', 'three'],                'C' : [np.nan, 'bla2', np.nan, 'bla3', np.nan, np.nan, np.nan, np.nan]}) 

Output:

     A      B     C 0  foo    one   NaN 1  bar    one  bla2 2  foo    two   NaN 3  bar  three  bla3 4  foo    two   NaN 5  bar    two   NaN 6  foo    one   NaN 7  foo  three   NaN 

I would like to use groupby in order to count the number of NaN's for the different combinations of foo.

Expected Output (EDIT):

     A      B     C    D 0  foo    one   NaN    2 1  bar    one  bla2    0 2  foo    two   NaN    2 3  bar  three  bla3    0 4  foo    two   NaN    2 5  bar    two   NaN    1 6  foo    one   NaN    2 7  foo  three   NaN    1 

Currently I am trying this:

df['count']=df.groupby(['A'])['B'].isnull().transform('sum') 

But this is not working...

Thank You

like image 391
Stefan Avatar asked Apr 10 '17 11:04

Stefan


People also ask

How do you count in Groupby pandas?

Use count() by Column Name Use pandas DataFrame. groupby() to group the rows by column and use count() method to get the count for each group by ignoring None and Nan values. It works with non-floating type data as well.

Does count include NaN pandas?

Count non-NA cells for each column or row. The values None , NaN , NaT , and optionally numpy.

What does Group_by do in pandas?

What is the GroupBy function? Pandas' GroupBy is a powerful and versatile function in Python. It allows you to split your data into separate groups to perform computations for better analysis.


1 Answers

I think you need groupby with sum of NaN values:

df2 = df.C.isnull().groupby([df['A'],df['B']]).sum().astype(int).reset_index(name='count') print(df2)      A      B  count 0  bar    one      0 1  bar  three      0 2  bar    two      1 3  foo    one      2 4  foo  three      1 5  foo    two      2 

If need filter first add boolean indexing:

df = df[df['A'] == 'foo'] df2 = df.C.isnull().groupby([df['A'],df['B']]).sum().astype(int) print(df2) A    B     foo  one      2      three    1      two      2 

Or simpler:

df = df[df['A'] == 'foo'] df2 = df['B'].value_counts() print(df2) one      2 two      2 three    1 Name: B, dtype: int64 

EDIT: Solution is very similar, only add transform:

df['D'] = df.C.isnull().groupby([df['A'],df['B']]).transform('sum').astype(int) print(df)      A      B     C  D 0  foo    one   NaN  2 1  bar    one  bla2  0 2  foo    two   NaN  2 3  bar  three  bla3  0 4  foo    two   NaN  2 5  bar    two   NaN  1 6  foo    one   NaN  2 7  foo  three   NaN  1 

Similar solution:

df['D'] = df.C.isnull() df['D'] = df.groupby(['A','B'])['D'].transform('sum').astype(int) print(df)      A      B     C  D 0  foo    one   NaN  2 1  bar    one  bla2  0 2  foo    two   NaN  2 3  bar  three  bla3  0 4  foo    two   NaN  2 5  bar    two   NaN  1 6  foo    one   NaN  2 7  foo  three   NaN  1 
like image 158
jezrael Avatar answered Oct 13 '22 02:10

jezrael