Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Drop columns from Pandas dataframe if they are not in specific list

Tags:

pandas

I have a pandas dataframe and it has some columns. I want to drop columns if they are not presented at a list.

pandas dataframe columns:

list(pandas_df.columns.values)

Result:

['id', 'name' ,'region', 'city']

And my expected column names:

final_table_columns = ['id', 'name', 'year']

After x operations result should be:

list(pandas_df.columns.values)

['id', 'name']
like image 650
mgnfcnt2 Avatar asked Jul 04 '19 16:07

mgnfcnt2


People also ask

How do I delete all columns in a DataFrame except certain ones?

Select All Except One Column Using drop() Method in pandas In order to remove columns use axis=1 or columns param. For example df. drop("Discount",axis=1) removes Discount column by kepping all other columns untouched. This gives you a DataFrame with all columns with out one unwanted column.


3 Answers

You could use a list comprehension creating all column-names to drop()

final_table_columns = ['id', 'name', 'year']
df = df.drop(columns=[col for col in df if col not in final_table_columns])

To do it in-place:

df.drop(columns=[col for col in df if col not in final_table_columns], inplace=True)
like image 29
ilja Avatar answered Oct 19 '22 06:10

ilja


Use Index.intersection to find the intersection of an index and a list of (column) labels:

pandas_df = pandas_df[pandas_df.columns.intersection(final_table_columns)]
like image 192
unutbu Avatar answered Oct 19 '22 05:10

unutbu


To do it in-place, consider Index.difference. This was not documented in any prior answer.

df.drop(columns=df.columns.difference(final_table_columns), inplace=True)

To create a new dataframe, Index.intersection also works.

df_final = df.drop(columns=df.columns.difference(final_table_columns)

df_final = df[df.columns.intersection(final_table_columns)]  # credited to unutbu
like image 8
Asclepius Avatar answered Oct 19 '22 07:10

Asclepius