Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Conditional Formatting on duplicate values using pandas

I have a dataFrame with 2 columns a A and B. I have to separate out subset of dataFrames using pandas to delete all the duplicate values.

For Example

My dataFrame looks like this

**A     B**
1     1
2     3
4     4
8     8 
5     6
4     7

Then the output should be

**A     B**
1     1       <--- both values Highlighted
2     3
4     4       <--- both values Highlighted
8     8       <--- both values Highlighted 
5     6
4     7       <--- value in column A highlighted

How do I do that?

Thanks in advance.

like image 455
Twinkle Lahariya Avatar asked Mar 05 '26 21:03

Twinkle Lahariya


1 Answers

You can use this:

def color_dupes(x):
    c1='background-color:red'
    c2=''
    cond=x.stack().duplicated(keep=False).unstack()
    df1 = pd.DataFrame(np.where(cond,c1,c2),columns=x.columns,index=x.index)
    return df1
df.style.apply(color_dupes,axis=None)
# if df has many columns: df.style.apply(color_dupes,axis=None,subset=['A','B'])

Example working code:

enter image description here

Explanation: First we stack the dataframe so as to bring all the columns into a series and find duplicated with keep=False to mark all duplicates as true:

df.stack().duplicated(keep=False)

0  A     True
   B     True
1  A    False
   B    False
2  A     True
   B     True
3  A     True
   B     True
4  A    False
   B    False
5  A     True
   B    False
dtype: bool

After this we unstack() the dataframe which gives a boolean dataframe with the same dataframe structure:

df.stack().duplicated(keep=False).unstack()
       A      B
0   True   True
1  False  False
2   True   True
3   True   True
4  False  False
5   True  False

Once we have this we assign the background color to values if True else no color using np.where

like image 199
anky Avatar answered Mar 07 '26 11:03

anky



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!