Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Intersection of two or more DataFrame columns

I am trying to find the intersect of three dataframes, however the pd.intersect1d does not like to use three dataframes.

import numpy as np
import pandas as pd
df1 = pd.DataFrame(np.random.randint(0,10,size=(10, 4)), columns=list('ABCD'))
df2 = pd.DataFrame(np.random.randint(0,10,size=(10, 4)), columns=list('BCDE'))
df3 = pd.DataFrame(np.random.randint(0,10,size=(10, 4)), columns=list('CDEF'))

inclusive_list = np.intersect1d(df1.columns, df2.columns, df3.columns)

Error:

ValueError: The truth value of a Index is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

The inclusive_list should only include column names C & D. Any help would be appreciated. Thank you.

like image 316
Starbucks Avatar asked Oct 21 '25 19:10

Starbucks


1 Answers

Why your current approach doesn't work:

intersect1d does not take N arrays, it only compares 2.

numpy.intersect1d(ar1, ar2, assume_unique=False, return_indices=False)

You can see from the definition that you are passing the third array as the assume_unique parameter, and since you are treating an array like a single boolean, you receive a ValueError.


You can extend the functionality of intersect1d to work on N arrays using functools.reduce:

from functools import reduce
reduce(np.intersect1d, (df1.columns, df2.columns, df3.columns))

array(['C', 'D'], dtype=object)

A better approach

However, the easiest approach is to just use intersection on the Index object:

df1.columns & df2.columns & df3.columns

Index(['C', 'D'], dtype='object')
like image 115
user3483203 Avatar answered Oct 23 '25 09:10

user3483203



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!