Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Marking Duplicates while ignoring null values in pandas

Tags:

python

pandas

I have been trying to highlight duplicates values in my df by using below code.

ncns['D-Account'] = ncns.duplicated('Account Number')

Although it marks the duplicates correctly, but it also marks the blank values as duplicate.

Please suggest something so that it should ignore blank values.

like image 976
Husnain Iqbal Avatar asked Oct 29 '25 16:10

Husnain Iqbal


1 Answers

If blank are missing values chain mask for test non misisng values by & for bitwise AND with Series.notna:

ncns['D-Account'] = ncns.duplicated('Account Number') & ncns['Account Number'].notna()

If blanks are empty strings compare non '' by Series.ne:

ncns['D-Account'] = ncns.duplicated('Account Number') & ncns['Account Number'].ne('')
like image 69
jezrael Avatar answered Oct 31 '25 05:10

jezrael



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!