I have a dataFrame with 2 columns a A and B. I have to separate out subset of dataFrames using pandas to delete all the duplicate values.
For Example
My dataFrame looks like this
**A B**
1 1
2 3
4 4
8 8
5 6
4 7
Then the output should be
**A B**
1 1 <--- both values Highlighted
2 3
4 4 <--- both values Highlighted
8 8 <--- both values Highlighted
5 6
4 7 <--- value in column A highlighted
How do I do that?
Thanks in advance.
You can use this:
def color_dupes(x):
c1='background-color:red'
c2=''
cond=x.stack().duplicated(keep=False).unstack()
df1 = pd.DataFrame(np.where(cond,c1,c2),columns=x.columns,index=x.index)
return df1
df.style.apply(color_dupes,axis=None)
# if df has many columns: df.style.apply(color_dupes,axis=None,subset=['A','B'])
Example working code:

Explanation:
First we stack the dataframe so as to bring all the columns into a series and find duplicated with keep=False to mark all duplicates as true:
df.stack().duplicated(keep=False)
0 A True
B True
1 A False
B False
2 A True
B True
3 A True
B True
4 A False
B False
5 A True
B False
dtype: bool
After this we unstack() the dataframe which gives a boolean dataframe with the same dataframe structure:
df.stack().duplicated(keep=False).unstack()
A B
0 True True
1 False False
2 True True
3 True True
4 False False
5 True False
Once we have this we assign the background color to values if True else no color using np.where
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With