Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Count null/NaN values in a dataframe across columns

I'm trying to count the number of unique values for each row across the columns of a dataframe.

More context in my previous post and my answer

Here is the current dataframe:

[in] df
[out] 
         PID         CID      PPID        PPPID       PPPPID        PPPPPID
    0   2015-01-02   456      2014-01-02  2014-01-02  2014-01-02    2014-01-02
    1   2015-02-02   500      2014-02-02  2013-02-02  2012-02-02    2012-02-10  
    2   2010-12-04   300      2010-12-04  2010-12-04  2010-12-04    2010-12-04 

All columns except CID (contract_ID) are datetimes. I'd like to add another column to the dataframe that counts the number of unique datetimes in each row (for the purpose of finding out how many contracts there are in a "chain").

I've tried different implementations of the .count() and .sum() methods but can't get them to work on a row-by-row basis (output is all rows with the same value).

Example:

df_merged['COUNT'] = df_merged2.count(axis=1)

Fills the entire 'COUNT' column with '6' when I'd like to be different for each row.

Removing the axis=1 argument makes the entire column 'NaN'

like image 474
VorpalBunnie Avatar asked Nov 18 '25 08:11

VorpalBunnie


2 Answers

You need apply(your_func, axis=1) to work on a row-by-row basis.

df

Out[19]: 
          PID  CID        PPID       PPPID      PPPPID     PPPPPID
0  2015-01-02  456  2014-01-02  2014-01-02  2014-01-02  2014-01-02
1  2015-02-02  500  2014-02-02  2013-02-02  2012-02-02  2012-02-10
2  2010-12-04  300  2010-12-04  2010-12-04  2010-12-04  2010-12-04



df['counts'] = df.drop('CID', axis=1).apply(lambda row: len(pd.unique(row)), axis=1)

Out[20]: 
          PID  CID        PPID       PPPID      PPPPID     PPPPPID  counts
0  2015-01-02  456  2014-01-02  2014-01-02  2014-01-02  2014-01-02       2
1  2015-02-02  500  2014-02-02  2013-02-02  2012-02-02  2012-02-10       5
2  2010-12-04  300  2010-12-04  2010-12-04  2010-12-04  2010-12-04       1

[3 rows x 7 columns]
like image 197
Jianxun Li Avatar answered Nov 19 '25 23:11

Jianxun Li


Another way would be to call unique on the transpose of your df:

In [26]:    
df['counts'] = df.drop('CID', axis=1).T.apply(lambda x: len(pd.Series.unique(x)))
df

Out[26]:
          PID  CID        PPID       PPPID      PPPPID     PPPPPID  counts
0  2015-01-02  456  2014-01-02  2014-01-02  2014-01-02  2014-01-02       2
1  2015-02-02  500  2014-02-02  2013-02-02  2012-02-02  2012-02-10       5
2  2010-12-04  300  2010-12-04  2010-12-04  2010-12-04  2010-12-04       1
like image 28
EdChum Avatar answered Nov 19 '25 23:11

EdChum



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!