I'm trying to figure out how to output the frequency of my First_Name column in my data frame; per row. So far I was successful in doing so but I would also like to know how to count both NaN values and Non-NaN values per row.
Below is a data frame with two columns: First_Name and Favorite_Color. I wanted to see if I can get a count of the First_Name column. When I output the code, I was only able to get a count of Non-NaN values. Is there a way to also include a count of NaN values and have that to a part of the data frame?
import pandas as pd
d =
{
'First_Name': ["Jared", "Lily", "Sarah", "Bill", "Bill", "Alfred", None],
'Favorite_Color': ["Blue", "Blue", "Pink", "Red", "Yellow", "Orange", "Red"]
}
df = pd.DataFrame(data=d)
df['countNames'] = df.groupby('First_Name')['First_Name'].transform('count')
print(df)
I expected to get a count of both NaN and non NaN values but I only got a count for Non-NaN values.
I really enjoyed reading everyone's answer, it's really interesting to see so many different solutions to solving this! I think SH-SF's answer is nice because it's a bit more easier to understand but does need to make use of the numpy library for the answer.
To count the NaN values in a column in a Pandas DataFrame, we can use the isna() method with sum.
Checking for missing values using isnull() and notnull() In order to check missing values in Pandas DataFrame, we use a function isnull() and notnull(). Both function help in checking whether a value is NaN or not.
The count property directly gives the count of non-NaN values in each column. So, we can get the count of NaN values, if we know the total number of observations. The isnull() function returns a dataset containing True and False values.
IIUC, this should fulfill your needs.
nasum=df['First_Name'].isnull().sum()
df['countNames'] = df.groupby('First_Name')['First_Name'].transform('count').replace(np.nan,nasum)
or, as suggested by ALollz, below code will also provide the same result
df['countNames'] = df.groupby('First_Name')['First_Name'].transform('count').fillna(nasum)
Input
First_Name Favorite_Color
0 Jared Blue
1 Lily Blue
2 Sarah Pink
3 Bill Red
4 Bill Yellow
5 Alfred Orange
6 None Red
7 None Pink
Output
First_Name Favorite_Color countNames
0 Jared Blue 1.0
1 Lily Blue 1.0
2 Sarah Pink 1.0
3 Bill Red 2.0
4 Bill Yellow 2.0
5 Alfred Orange 1.0
6 None Red 2.0
7 None Pink 2.0
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With