Marking Duplicates while ignoring null values in pandas

Question

I have been trying to highlight duplicates values in my df by using below code.

ncns['D-Account'] = ncns.duplicated('Account Number')

Although it marks the duplicates correctly, but it also marks the blank values as duplicate.

Please suggest something so that it should ignore blank values.

jezrael · Accepted Answer

If blank are missing values chain mask for test non misisng values by & for bitwise AND with Series.notna:

ncns['D-Account'] = ncns.duplicated('Account Number') & ncns['Account Number'].notna()

If blanks are empty strings compare non '' by Series.ne:

ncns['D-Account'] = ncns.duplicated('Account Number') & ncns['Account Number'].ne('')

Donate For Us