highlight column by column difference for each row in pandas

Question

Let's say I have a pd.DataFrame that looks as such:

id  col1_a  col1_b  col2_a  col2_b
1   x       x       2       3  
2   z       d       4       5
3   y       y       9       9
4   p       p       8       1

What this dataframe represents is a 2 dataframe (df_a, df_b) column by column comparison.

I am trying to get a dataframe that highlights and finds the columns that contain those differences as such:

id  col1_a  col1_b  col2_a  col2_b   diff
1   x       x       2       3        col2
2   z       d       4       5        col1,col2
3   y       y       9       9        None
4   p       p       8       1        col2

How can I achieve something like this without having to doubly traverse through the cols and rows.

I know I can achieve this by doing something similar (not tested):

for col_ptr1 in df.columns:
   for col_ptr2 in df.columns:
      for idx, row in df.iterrows():
         if col_ptr1.strip('_a') == col_ptr2.strip('_b'):
            blah blah blah...

This is super ugly. I wonder if there is a more pandas style approach to this.

Shubham Sharma · Accepted Answer

Select the subset of columns containing col, then split these column names around delimiter _ and extract the first component of split using the str accessor

Now, group the dataframe on the col prefix extracted in the previous step, and agg using nunique along axis=1 to count the unqiue values. Check for the unique values if not equal to one then add the corresponding column names in diff columns using dataframe.dot

c = df.filter(regex=r'_(a|b)$')
m = c.groupby(c.columns.str.split('_').str[0], axis=1).nunique().ne(1)
df['diff'] = m.dot(m.columns + ',').str[:-1]

   id col1_a col1_b  col2_a  col2_b       diff
0   1      x      x       2       3       col2
1   2      z      d       4       5  col1,col2
2   3      y      y       9       9           
3   4      p      p       8       1       col2

highlight column by column difference for each row in pandas

Tags:

python

pandas

dataframe

sgerbhctim

1 Answers

Shubham Sharma

Recent Activity

Donate For Us

highlight column by column difference for each row in pandas

Tags:

python

pandas

dataframe

sgerbhctim

1 Answers

Shubham Sharma

Related questions

Recent Activity

Donate For Us