Compare elements in dataframe columns for each row

I have a really huge dataframe (thousends of rows), but let's assume it is like this:

   A  B  C  D  E  F
0  2  5  2  2  2  2
1  5  2  5  5  5  5
2  5  2  5  2  5  5
3  2  2  2  2  2  2
4  5  5  5  5  5  5

I need to see which value appears most frequently in a group of columns for each row. For instance, the value that appears most frequently in columns ABC and in columns DEF in each row, and put them in another column. In this example, my expected output is

How can I do it in Python??? Thanks!!

How do I compare two DataFrame column values in Python?

Pand as Compare Method This method compares two data frames, row-by-row and column-by-column. It then displays the differences next to each other. The compare function can only compare DataFrames of a similar structure, with the same row and column names and equal sizes.

How do I compare values in two columns in pandas DataFrame?

By using the Where() method in NumPy, we are given the condition to compare the columns. If 'column1' is lesser than 'column2' and 'column1' is lesser than the 'column3', We print the values of 'column1'. If the condition fails, we give the value as 'NaN'. These results are stored in the new column in the dataframe.

How do I iterate through every row in a DataFrame?

Iterating over the rows of a DataFrame You can do so using either iterrows() or itertuples() built-in methods.

How do I compare 3 columns in pandas?

The new column called all_matching shows whether or not the values in all three columns match in a given row. For example: All three values match in the first row, so True is returned. Not every value matches in the second row, so False is returned.

Here is one way using columns groupby

mapperd={'A':'ABC','B':'ABC','C':'ABC','D':'DEF','E':'DEF','F':'DEF'}
df.groupby(mapperd,axis=1).agg(lambda x : x.mode()[0])
Out[826]: 
   ABC  DEF
0    2    2
1    5    5
2    5    5
3    2    2
4    5    5

For a good performance you can work with the underlying numpy arrays, and use scipy.stats.mode to compute the mode:

from scipy import stats
cols = ['ABC','DEF']
a = df.values.reshape(-1, df.shape[1]//2)
pd.DataFrame(stats.mode(a, axis=1).mode.reshape(-1,2), columns=cols)

    ABC  DEF
0    2    2
1    5    5
2    5    5
3    2    2
4    5    5

You try using column header index filtering:

grp = ['ABC','DEF']
pd.concat([df.loc[:,[*g]].mode(1).set_axis([g], axis=1, inplace=False) for g in grp], axis=1)

Output:

   ABC  DEF
0    2    2
1    5    5
2    5    5
3    2    2
4    5    5

Compare elements in dataframe columns for each row - Python

Tags:

python

pandas

dataframe

Ally

People also ask

3 Answers

BENY

yatu

Scott Boston

Recent Activity

Donate For Us