I have a pandas dataframe and it has some columns. I want to drop columns if they are not presented at a list. pandas dataframe columns: <pre class="prettyprint"><code>list(pandas_df.columns.values) </code></pre> Result: <pre class="prettyprint"><code>['id', 'name' ,'region', 'city'] </code></pre> And my expected column names: <pre class="prettyprint"><code>final_table_columns = ['id', 'name', 'year'] </code></pre> After x operations result should be: <pre class="prettyprint"><code>list(pandas_df.columns.values) ['id', 'name'] </code></pre>

Use <code>Index.intersection</code> to find the intersection of an index and a list of (column) labels: <pre class="prettyprint"><code>pandas_df = pandas_df[pandas_df.columns.intersection(final_table_columns)] </code></pre>

To do it in-place, consider <code>Index.difference</code>. This was not documented in any prior answer. <pre class="prettyprint"><code>df.drop(columns=df.columns.difference(final_table_columns), inplace=True) </code></pre> To create a new dataframe, <code>Index.intersection</code> also works. <pre class="prettyprint"><code>df_final = df.drop(columns=df.columns.difference(final_table_columns) df_final = df[df.columns.intersection(final_table_columns)] # credited to unutbu </code></pre>

Drop columns from Pandas dataframe if they are not in specific list

Tags:

pandas

I have a pandas dataframe and it has some columns. I want to drop columns if they are not presented at a list.

pandas dataframe columns:

list(pandas_df.columns.values)

Result:

['id', 'name' ,'region', 'city']

And my expected column names:

final_table_columns = ['id', 'name', 'year']

After x operations result should be:

list(pandas_df.columns.values)

['id', 'name']

650

asked Jul 04 '19 16:07

mgnfcnt2

3 Answers

You could use a list comprehension creating all column-names to drop()

final_table_columns = ['id', 'name', 'year']
df = df.drop(columns=[col for col in df if col not in final_table_columns])

To do it in-place:

df.drop(columns=[col for col in df if col not in final_table_columns], inplace=True)

answered Oct 19 '22 06:10

ilja

Use Index.intersection to find the intersection of an index and a list of (column) labels:

pandas_df = pandas_df[pandas_df.columns.intersection(final_table_columns)]

192

answered Oct 19 '22 05:10

unutbu

To do it in-place, consider Index.difference. This was not documented in any prior answer.

df.drop(columns=df.columns.difference(final_table_columns), inplace=True)

To create a new dataframe, Index.intersection also works.

df_final = df.drop(columns=df.columns.difference(final_table_columns)

df_final = df[df.columns.intersection(final_table_columns)]  # credited to unutbu

answered Oct 19 '22 07:10

Asclepius

Related questions
                            
                                Pandas select only numeric or integer field from dataframe
                            
                                Find index of last true value in pandas Series or DataFrame
                            
                                Pandas, Get count of a single value in a Column of a Dataframe
                            
                                Set y axis limit in Pandas histogram
                            
                                How to read a compressed (gz) CSV file into a dask Dataframe?
                            
                                Get Column and Row Index for Highest Value in Dataframe Pandas
                            
                                Is a column in pandas.DF() monotonically increasing?
                            
                                Performance difference in pandas read_table vs. read_csv vs. from_csv vs. read_excel?
                            
                                How to save a plot in Seaborn with Python [duplicate]
                            
                                pandas invalid literal for long() with base 10 error
                            
                                Unpickling dictionary that holds pandas dataframes throws AttributeError: 'Dataframe' object has no attribute '_data'
                            
                                Get min and max values of categorical variable in a dataframe
                            
                                Pandas datetime column to ordinal
                            
                                Convert a column of timestamps into periods in pandas
                            
                                Using Python libraries to plot two horizontal bar charts sharing same y axis [closed]
                            
                                Type error in visualising pandas dataframe as heatmap
                            
                                Python pandas stuck at version 0.7.0
                            
                                Stacked histogram of grouped values in Pandas
                            
                                Pandas rank by multiple columns
                            
                                How do I get the percentile for a row in a pandas dataframe?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With