Count null/NaN values in a dataframe across columns

Question

I'm trying to count the number of unique values for each row across the columns of a dataframe.

More context in my previous post and my answer

Here is the current dataframe:

[in] df
[out] 
         PID         CID      PPID        PPPID       PPPPID        PPPPPID
    0   2015-01-02   456      2014-01-02  2014-01-02  2014-01-02    2014-01-02
    1   2015-02-02   500      2014-02-02  2013-02-02  2012-02-02    2012-02-10  
    2   2010-12-04   300      2010-12-04  2010-12-04  2010-12-04    2010-12-04

All columns except CID (contract_ID) are datetimes. I'd like to add another column to the dataframe that counts the number of unique datetimes in each row (for the purpose of finding out how many contracts there are in a "chain").

I've tried different implementations of the .count() and .sum() methods but can't get them to work on a row-by-row basis (output is all rows with the same value).

Example:

df_merged['COUNT'] = df_merged2.count(axis=1)

Fills the entire 'COUNT' column with '6' when I'd like to be different for each row.

Removing the axis=1 argument makes the entire column 'NaN'

Jianxun Li · Accepted Answer

You need apply(your_func, axis=1) to work on a row-by-row basis.

df

Out[19]: 
          PID  CID        PPID       PPPID      PPPPID     PPPPPID
0  2015-01-02  456  2014-01-02  2014-01-02  2014-01-02  2014-01-02
1  2015-02-02  500  2014-02-02  2013-02-02  2012-02-02  2012-02-10
2  2010-12-04  300  2010-12-04  2010-12-04  2010-12-04  2010-12-04



df['counts'] = df.drop('CID', axis=1).apply(lambda row: len(pd.unique(row)), axis=1)

Out[20]: 
          PID  CID        PPID       PPPID      PPPPID     PPPPPID  counts
0  2015-01-02  456  2014-01-02  2014-01-02  2014-01-02  2014-01-02       2
1  2015-02-02  500  2014-02-02  2013-02-02  2012-02-02  2012-02-10       5
2  2010-12-04  300  2010-12-04  2010-12-04  2010-12-04  2010-12-04       1

[3 rows x 7 columns]

EdChum · Answer

Another way would be to call unique on the transpose of your df:

In [26]:    
df['counts'] = df.drop('CID', axis=1).T.apply(lambda x: len(pd.Series.unique(x)))
df

Out[26]:
          PID  CID        PPID       PPPID      PPPPID     PPPPPID  counts
0  2015-01-02  456  2014-01-02  2014-01-02  2014-01-02  2014-01-02       2
1  2015-02-02  500  2014-02-02  2013-02-02  2012-02-02  2012-02-10       5
2  2010-12-04  300  2010-12-04  2010-12-04  2010-12-04  2010-12-04       1

Count null/NaN values in a dataframe across columns

Tags:

python

datetime

pandas

nan

VorpalBunnie

2 Answers

Jianxun Li

EdChum

Recent Activity

Donate For Us

Count null/NaN values in a dataframe across columns

Tags:

python

datetime

pandas

nan

VorpalBunnie

2 Answers

Jianxun Li

EdChum

Related questions

Recent Activity

Donate For Us