I'm trying to figure out how to output the frequency of my First_Name column in my data frame; per row. So far I was successful in doing so but I would also like to know how to count both NaN values and Non-NaN values per row.
Below is a data frame with two columns: First_Name and Favorite_Color. I wanted to see if I can get a count of the First_Name column. When I output the code, I was only able to get a count of Non-NaN values. Is there a way to also include a count of NaN values and have that to a part of the data frame?
import pandas as pd
d = 
{
'First_Name': ["Jared", "Lily", "Sarah", "Bill", "Bill", "Alfred", None], 
'Favorite_Color': ["Blue", "Blue", "Pink", "Red", "Yellow", "Orange", "Red"]
}
df = pd.DataFrame(data=d)
df['countNames'] = df.groupby('First_Name')['First_Name'].transform('count')
print(df)
I expected to get a count of both NaN and non NaN values but I only got a count for Non-NaN values.
I really enjoyed reading everyone's answer, it's really interesting to see so many different solutions to solving this! I think SH-SF's answer is nice because it's a bit more easier to understand but does need to make use of the numpy library for the answer.
To count the NaN values in a column in a Pandas DataFrame, we can use the isna() method with sum.
Checking for missing values using isnull() and notnull() In order to check missing values in Pandas DataFrame, we use a function isnull() and notnull(). Both function help in checking whether a value is NaN or not.
The count property directly gives the count of non-NaN values in each column. So, we can get the count of NaN values, if we know the total number of observations. The isnull() function returns a dataset containing True and False values.
IIUC, this should fulfill your needs.
nasum=df['First_Name'].isnull().sum()
df['countNames'] = df.groupby('First_Name')['First_Name'].transform('count').replace(np.nan,nasum)
or, as suggested by ALollz, below code will also provide the same result
df['countNames'] = df.groupby('First_Name')['First_Name'].transform('count').fillna(nasum)
Input
       First_Name   Favorite_Color
0         Jared     Blue
1          Lily     Blue
2         Sarah     Pink
3          Bill     Red
4          Bill     Yellow
5          Alfred   Orange
6          None     Red
7          None     Pink
Output
     First_Name     Favorite_Color  countNames
0         Jared          Blue        1.0
1         Lily           Blue        1.0
2         Sarah          Pink        1.0
3         Bill           Red         2.0
4         Bill           Yellow      2.0
5         Alfred         Orange      1.0
6         None           Red         2.0
7         None           Pink        2.0
                        If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With