Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Count NaN per row with Pandas

Tags:

python

pandas

I'm trying to figure out how to output the frequency of my First_Name column in my data frame; per row. So far I was successful in doing so but I would also like to know how to count both NaN values and Non-NaN values per row.

Below is a data frame with two columns: First_Name and Favorite_Color. I wanted to see if I can get a count of the First_Name column. When I output the code, I was only able to get a count of Non-NaN values. Is there a way to also include a count of NaN values and have that to a part of the data frame?

import pandas as pd

d = 
{
'First_Name': ["Jared", "Lily", "Sarah", "Bill", "Bill", "Alfred", None], 
'Favorite_Color': ["Blue", "Blue", "Pink", "Red", "Yellow", "Orange", "Red"]
}

df = pd.DataFrame(data=d)

df['countNames'] = df.groupby('First_Name')['First_Name'].transform('count')

print(df)

I expected to get a count of both NaN and non NaN values but I only got a count for Non-NaN values.

Edit: Thank you Everyone!

I really enjoyed reading everyone's answer, it's really interesting to see so many different solutions to solving this! I think SH-SF's answer is nice because it's a bit more easier to understand but does need to make use of the numpy library for the answer.

like image 720
LeonCecil Avatar asked Sep 20 '19 01:09

LeonCecil


People also ask

How count NaN values in Pandas column?

To count the NaN values in a column in a Pandas DataFrame, we can use the isna() method with sum.

How do you find the number of missing values in a row in Python?

Checking for missing values using isnull() and notnull() In order to check missing values in Pandas DataFrame, we use a function isnull() and notnull(). Both function help in checking whether a value is NaN or not.

Does count function count NaN values?

The count property directly gives the count of non-NaN values in each column. So, we can get the count of NaN values, if we know the total number of observations. The isnull() function returns a dataset containing True and False values.


1 Answers

IIUC, this should fulfill your needs.

nasum=df['First_Name'].isnull().sum()
df['countNames'] = df.groupby('First_Name')['First_Name'].transform('count').replace(np.nan,nasum)

or, as suggested by ALollz, below code will also provide the same result

df['countNames'] = df.groupby('First_Name')['First_Name'].transform('count').fillna(nasum)

Input

       First_Name   Favorite_Color
0         Jared     Blue
1          Lily     Blue
2         Sarah     Pink
3          Bill     Red
4          Bill     Yellow
5          Alfred   Orange
6          None     Red
7          None     Pink

Output

     First_Name     Favorite_Color  countNames
0         Jared          Blue        1.0
1         Lily           Blue        1.0
2         Sarah          Pink        1.0
3         Bill           Red         2.0
4         Bill           Yellow      2.0
5         Alfred         Orange      1.0
6         None           Red         2.0
7         None           Pink        2.0
like image 200
moys Avatar answered Sep 30 '22 06:09

moys