I'm considering merge operations on dataframes each with a large number of columns. Don't want the result to have two columns with the same name. Am trying to view a list of column names in common between the two frames:
import pandas as pd
a = [{'A': 3, 'B': 5, 'C': 3, 'D': 2},{'A': 2, 'B': 4, 'C': 3, 'D': 9}]
df1 = pd.DataFrame(a)
b = [{'F': 0, 'M': 4, 'B': 2, 'C': 8 },{'F': 2, 'M': 4, 'B': 3, 'C': 9}]
df2 = pd.DataFrame(b)
df1.columns
>> Index(['A', 'B', 'C', 'D'], dtype='object')
df2.columns
>> Index(['B', 'C', 'F', 'M'], dtype='object')
(df2.columns).isin(df1.columns)
>> array([ True, True, False, False])
How do I operate that NumPy boolean array on the Index object so it just gives back a list of the columns in common?
DataFrame - equals() function This function allows two Series or DataFrames to be compared against each other to see if they have the same shape and elements. NaNs in the same location are considered equal. The column headers do not need to have the same type, but the elements within the columns must be the same dtype.
To merge two Pandas DataFrame with common column, use the merge() function and set the ON parameter as the column name.
Use numpy.intersect1d
or intersection
:
a = np.intersect1d(df2.columns, df1.columns)
print (a)
['B' 'C']
a = df2.columns.intersection(df1.columns)
print (a)
Index(['B', 'C'], dtype='object')
Alternative syntax for the latter option:
df1.columns & df2.columns
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With