Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Count number of columns with some values for each row in pandas

I have dataframe like this, data:

Site code    Col1  Col2  Col3
A5252        24    53     NaN
A5636        36    NaN    NaN
A4366        NaN   NaN    NaN
A7578        42    785    24

And I want to count a number of columns with some value, but none NaN. Desired output:

 Site code   Col1  Col2  Col3  Count
    A5252     24    53     NaN    2
    A5636     36    NaN    NaN    1
    A4366     NaN   NaN    NaN    0
    A7578     42    785    24     3

Something oposite to this: df = data.isnull().sum(axis=1)

like image 661
jovicbg Avatar asked Jun 23 '17 08:06

jovicbg


People also ask

How do you count occurrences of specific value in Pandas row?

We can count by using the value_counts() method. This function is used to count the values present in the entire dataframe and also count values in a particular column.

How do I count the number of occurrences in a column in Pandas?

Using the size() or count() method with pandas. DataFrame. groupby() will generate the count of a number of occurrences of data present in a particular column of the dataframe.

What is difference between count () and Value_counts?

count() should be used when you want to find the frequency of valid values present in columns with respect to specified col . . value_counts() should be used to find the frequencies of a series.

How do you find the number of missing values in each row in Pandas?

Checking for missing values using isnull() and notnull() In order to check missing values in Pandas DataFrame, we use a function isnull() and notnull().


2 Answers

Need change isnull to notnull:

#if first columns is not index, set it
data = data.set_index('Site code')
data['Count'] = data.notnull().sum(axis=1)

Or use function DataFrame.count:

data = data.set_index('Site code')
data['Count'] = data.count(axis=1)
print (data)
           Col1   Col2  Col3  Count
Site code                          
A5252      24.0   53.0   NaN      2
A5636      36.0    NaN   NaN      1
A4366       NaN    NaN   NaN      0
A7578      42.0  785.0  24.0      3

Another solution with selecting columns by loc (Site code is column, not index):

print (data.loc[:, 'Col1':])
   Col1   Col2  Col3
0  24.0   53.0   NaN
1  36.0    NaN   NaN
2   NaN    NaN   NaN
3  42.0  785.0  24.0

data['Count'] = data.loc[:, 'Col1':].count(axis=1)
print (data)
  Site code  Col1   Col2  Col3  Count
0     A5252  24.0   53.0   NaN      2
1     A5636  36.0    NaN   NaN      1
2     A4366   NaN    NaN   NaN      0
3     A7578  42.0  785.0  24.0      3

Another nice idea from Jon Clements - use filter:

data['Count'] = data.filter(regex="^Col").count(axis=1)
print (data)

  Site code  Col1   Col2  Col3  Count
0     A5252  24.0   53.0   NaN      2
1     A5636  36.0    NaN   NaN      1
2     A4366   NaN    NaN   NaN      0
3     A7578  42.0  785.0  24.0      3
like image 54
jezrael Avatar answered Oct 21 '22 14:10

jezrael


Simple use notnull()

import pandas as pd
df = pd.read_csv("your_csv.csv")

df['count'] = df.notnull().sum(axis=1)

print(df)

Also to add a column to a dataframe just use:

df['new_column_name'] = newcolumn

output:

Site code   Col1  Col 2  Col3  count
    A5252     24    53     NaN    2
    A5636     36    NaN    NaN    1
    A4366     NaN   NaN    NaN    0
    A7578     42    785    24     3
like image 42
void Avatar answered Oct 21 '22 13:10

void