Selecting/excluding sets of columns in pandas [duplicate]

Tags:

I would like to create views or dataframes from an existing dataframe based on column selections.

For example, I would like to create a dataframe df2 from a dataframe df1 that holds all columns from it except two of them. I tried doing the following, but it didn't work:

import numpy as np import pandas as pd  # Create a dataframe with columns A,B,C and D df = pd.DataFrame(np.random.randn(100, 4), columns=list('ABCD'))  # Try to create a second dataframe df2 from df with all columns except 'B' and D my_cols = set(df.columns) my_cols.remove('B').remove('D')  # This returns an error ("unhashable type: set") df2 = df[my_cols]

What am I doing wrong? Perhaps more generally, what mechanisms does pandas have to support the picking and exclusions of arbitrary sets of columns from a dataframe?

599

asked Feb 18 '13 16:02

Amelio Vazquez-Reina

2 Answers

You can either Drop the columns you do not need OR Select the ones you need

# Using DataFrame.drop df.drop(df.columns[[1, 2]], axis=1, inplace=True)  # drop by Name df1 = df1.drop(['B', 'C'], axis=1)  # Select the ones you want df1 = df[['a','d']]

200

answered Oct 20 '22 00:10

Amrita Sawant

There is a new index method called difference. It returns the original columns, with the columns passed as argument removed.

Here, the result is used to remove columns B and D from df:

df2 = df[df.columns.difference(['B', 'D'])]

Note that it's a set-based method, so duplicate column names will cause issues, and the column order may be changed.

Advantage over drop: you don't create a copy of the entire dataframe when you only need the list of columns. For instance, in order to drop duplicates on a subset of columns:

# may create a copy of the dataframe subset = df.drop(['B', 'D'], axis=1).columns  # does not create a copy the dataframe subset = df.columns.difference(['B', 'D'])  df = df.drop_duplicates(subset=subset)

answered Oct 20 '22 01:10

IanS

Related questions
                            
                                How to adjust padding with cutoff or overlapping labels
                            
                                Converting dictionary to JSON
                            
                                Is there a "not equal" operator in Python?
                            
                                What is memoization and how can I use it in Python?
                            
                                What are logits? What is the difference between softmax and softmax_cross_entropy_with_logits?
                            
                                What is the purpose of meshgrid in Python / NumPy?
                            
                                Improve subplot size/spacing with many subplots in matplotlib
                            
                                Python Progress Bar
                            
                                How to get current CPU and RAM usage in Python?
                            
                                Get first row value of a given column
                            
                                How do I find out my PYTHONPATH using Python?
                            
                                Python argparse command line flags without arguments
                            
                                Automatically creating directories with file output [duplicate]
                            
                                JSONDecodeError: Expecting value: line 1 column 1 (char 0)
                            
                                How do I get the path of the Python script I am running in? [duplicate]
                            
                                Installing Python packages from local file system folder to virtualenv with pip
                            
                                How to get POSTed JSON in Flask?
                            
                                Implement touch using Python?
                            
                                Removing Conda environment
                            
                                How do I create test and train samples from one dataframe with pandas?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Selecting/excluding sets of columns in pandas [duplicate]

Tags:

python

pandas

dataframe

Amelio Vazquez-Reina

People also ask

2 Answers

Amrita Sawant

IanS

Recent Activity

Donate For Us