I have dataframe like this, data:
Site code Col1 Col2 Col3
A5252 24 53 NaN
A5636 36 NaN NaN
A4366 NaN NaN NaN
A7578 42 785 24
And I want to count a number of columns with some value, but none NaN. Desired output:
Site code Col1 Col2 Col3 Count
A5252 24 53 NaN 2
A5636 36 NaN NaN 1
A4366 NaN NaN NaN 0
A7578 42 785 24 3
Something oposite to this: df = data.isnull().sum(axis=1)
We can count by using the value_counts() method. This function is used to count the values present in the entire dataframe and also count values in a particular column.
Using the size() or count() method with pandas. DataFrame. groupby() will generate the count of a number of occurrences of data present in a particular column of the dataframe.
count() should be used when you want to find the frequency of valid values present in columns with respect to specified col . . value_counts() should be used to find the frequencies of a series.
Checking for missing values using isnull() and notnull() In order to check missing values in Pandas DataFrame, we use a function isnull() and notnull().
Need change isnull
to notnull
:
#if first columns is not index, set it
data = data.set_index('Site code')
data['Count'] = data.notnull().sum(axis=1)
Or use function DataFrame.count
:
data = data.set_index('Site code')
data['Count'] = data.count(axis=1)
print (data)
Col1 Col2 Col3 Count
Site code
A5252 24.0 53.0 NaN 2
A5636 36.0 NaN NaN 1
A4366 NaN NaN NaN 0
A7578 42.0 785.0 24.0 3
Another solution with selecting columns by loc
(Site code
is column, not index
):
print (data.loc[:, 'Col1':])
Col1 Col2 Col3
0 24.0 53.0 NaN
1 36.0 NaN NaN
2 NaN NaN NaN
3 42.0 785.0 24.0
data['Count'] = data.loc[:, 'Col1':].count(axis=1)
print (data)
Site code Col1 Col2 Col3 Count
0 A5252 24.0 53.0 NaN 2
1 A5636 36.0 NaN NaN 1
2 A4366 NaN NaN NaN 0
3 A7578 42.0 785.0 24.0 3
Another nice idea from Jon Clements - use filter
:
data['Count'] = data.filter(regex="^Col").count(axis=1)
print (data)
Site code Col1 Col2 Col3 Count
0 A5252 24.0 53.0 NaN 2
1 A5636 36.0 NaN NaN 1
2 A4366 NaN NaN NaN 0
3 A7578 42.0 785.0 24.0 3
Simple use notnull()
import pandas as pd
df = pd.read_csv("your_csv.csv")
df['count'] = df.notnull().sum(axis=1)
print(df)
Also to add a column to a dataframe just use:
df['new_column_name'] = newcolumn
output:
Site code Col1 Col 2 Col3 count
A5252 24 53 NaN 2
A5636 36 NaN NaN 1
A4366 NaN NaN NaN 0
A7578 42 785 24 3
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With