Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas check if value in one multiindex column is in any column, same row of different multiindex

I have the following dataframe:

df = pd.DataFrame([[0, 1, 7, 0, 1, 8, 3, 0],
                   [7, 3, 4, 0, 4, 9, 7, 0]], 
                  columns=pd.MultiIndex.from_product([["first", "second"], 
                                                      ["A", "B", "C", "D"]]))
print(df)

  first          second         
      A  B  C  D      A  B  C  D
0     0  1  7  0      1  8  3  0
1     7  3  4  0      4  9  7  0

I want to check, whether the values in first are present in any of the columns of second. Only the same row should be compared.

The resulting dataframe should look like this:

      A      B      C     D
0  True   True  False  True
1  True  False   True  True

What is the best way of doing this? I have already played around with df["first"].isin(df["second"] but it only compares A with A, B with B, ... Also tried it in combination with .any() but I can't seem to make it work.

Your help is greatly appreciated!

Thank you in advance.

like image 775
mkoeck Avatar asked Mar 25 '21 14:03

mkoeck


People also ask

How do you check if one value is present in another column in pandas?

Use in operator on a Series to check if a column contains/exists a string value in a pandas DataFrame. df['Courses'] returns a Series object with all values from column Courses , pandas. Series. unique will return unique values of the Series object.

How do I check if two rows have the same value in pandas?

Pandas Series: equals() function The equals() function is used to test whether two Pandas objects contain the same elements. This function allows two Series or DataFrames to be compared against each other to see if they have the same shape and elements. NaNs in the same location are considered equal.

How do you compare two consecutive rows in pandas?

During data analysis, one might need to compute the difference between two rows for comparison purposes. This can be done using pandas. DataFrame. diff() function.

How do I compare two rows in a DataFrame pandas?

The compare method in pandas shows the differences between two DataFrames. It compares two data frames, row-wise and column-wise, and presents the differences side by side. The compare method can only compare DataFrames of the same shape, with exact dimensions and identical row and column labels.

How to check if a column contains multiple values in pandas?

Pandas.Series.isin () function is used to check whether a column contains a list of multiple values. It returns a boolean Series showing each element in the Series matches an element in the passed sequence of values exactly.

How to access data in a multiindex Dataframe in pandas?

Accessing Data in a MultiIndex DataFrame in Pandas 1. Selecting data via the first level index When it comes to select data on a DataFrame, Pandas loc is one of the top... 2. Selecting data via multi-level index If you want to read London ’s Day weather on 2019–07–01, you can simply do: >>>... 3. ...

How to select all rows that contain the value 25 in Dataframe?

The following syntax shows how to select all rows of the DataFrame that contain the value 25 in any of the columns: df [df.isin( [25]).any(axis=1)] points assists rebounds 0 25 5 11 The following syntax shows how to select all rows of the DataFrame that contain the values 25, 9, or 6 in any of the columns:

What is a multiindex in SQL Server?

A MultiIndex (also known as a hierarchical index) DataFrame allows you to have multiple columns acting as a row identifier and multiple rows acting as a header identifier. With MultiIndex , you can do some sophisticated data analysis, especially for working with higher dimensional data.


Video Answer


2 Answers

Numpy broadcasting

np.any(df['first'].T.values[:, :, None] == df['second'].values, axis=-1).T

array([[ True,  True, False,  True],
       [ True, False,  True,  True]])
like image 65
Shubham Sharma Avatar answered Sep 28 '22 10:09

Shubham Sharma


Another solution:

df['first'].apply(lambda x: x.isin(df.loc[x.name, ('second')]), axis=1)

Output:

    A         B       C      D
0   True    True    False   True
1   True    False   True    True
like image 44
ashkangh Avatar answered Sep 28 '22 10:09

ashkangh