Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

pandas return columns in dataframe that are not in other dataframe

Tags:

python

pandas

I have two dataframes that look like this:

df_1 = pd.DataFrame({
'A' : [1.0, 2.0, 3.0, 4.0],
'B' : [100, 200, 300, 400],
'C' : [2, 3, 4, 5] 
                   })

df_2 = pd.DataFrame({
'B' : [1.0, 2.0, 3.0, 4.0],
'C' : [100, 200, 300, 400],
'D' : [2, 3, 4, 5] 
                  })

Now if I utilize pandas .isin function I can do something nifty like this

>>> print df_2.columns.isin(df_1.columns)
array([ True,  True, False], dtype=bool)

Columns B and C from df_2 exist in df_1 while D doesn't

My question is: does anyone know of a way to return the columns' labels for columns that exist in df_2 but not in df_1

something like this

array([u'D'], dtype=string)

Thank you in advance!

like image 375
cgclip Avatar asked Mar 26 '17 12:03

cgclip


People also ask

How do I get all columns in a DataFrame except one?

To select all columns except one column in Pandas DataFrame, we can use df. loc[:, df. columns != <column name>].

How do you find rows from one DataFrame is not in another?

Method 2: Using setdiff() This is an R built-in function to find the set difference of two dataframes. It will return rows in df1 that are not present in df2.

How do I get a column value of a pandas DataFrame based on another column?

You can extract a column of pandas DataFrame based on another value by using the DataFrame. query() method. The query() is used to query the columns of a DataFrame with a boolean expression. The blow example returns a Courses column where the Fee column value matches with 25000.

How do I keep only certain columns in pandas DataFrame?

If you have a DataFrame and would like to access or select a specific few rows/columns from that DataFrame, you can use square brackets or other advanced methods such as loc and iloc .


1 Answers

Pandas index object have set-like properties, so you can directly do:

df_2.columns.difference(df_1.columns)
Index([u'D'], dtype='object')

You can also use operators like &|^ to compute intersection, union and symmetric difference:

df_1.columns & df_2.columns
Index([u'B', u'C'], dtype='object')

df_1.columns | df_2.columns
Index([u'A', u'B', u'C', u'D'], dtype='object')

df_1.columns ^ df_2.columns
Index([u'A', u'D'], dtype='object')

There use to be the -operator for difference, now deprecated:

df_2.columns - df_1.columns
FutureWarning: using '-' to provide set differences with Indexes is deprecated, use .difference()
Index([u'D'], dtype='object')
like image 58
jrjc Avatar answered Nov 01 '22 03:11

jrjc