Keep certain columns in a pandas DataFrame, deleting everything else

Tags:

pandas

Say I have a data table

    1  2  3  4  5  6 ..  n A   x  x  x  x  x  x ..  x B   x  x  x  x  x  x ..  x C   x  x  x  x  x  x ..  x

And I want to slim it down so that I only have, say, columns 3 and 5 deleting all other and maintaining the structure. How could I do this with pandas? I think I understand how to delete a single column, but I don't know how to save a select few and delete all others.

447

asked May 17 '13 19:05

Matt

2 Answers

If you have a list of columns you can just select those:

In [11]: df Out[11]:    1  2  3  4  5  6 A  x  x  x  x  x  x B  x  x  x  x  x  x C  x  x  x  x  x  x  In [12]: col_list = [3, 5]  In [13]: df = df[col_list]  In [14]: df Out[14]:    3  5 A  x  x B  x  x C  x  x

190

answered Oct 06 '22 07:10

Andy Hayden

How do I keep certain columns in a pandas DataFrame, deleting everything else?

The answer to this question is the same as the answer to "How do I delete certain columns in a pandas DataFrame?" Here are some additional options to those mentioned so far, along with timings.

`DataFrame.loc`

One simple option is selection, as mentioned by in other answers,

# Setup. df    1  2  3  4  5  6 A  x  x  x  x  x  x B  x  x  x  x  x  x C  x  x  x  x  x  x  cols_to_keep = [3,5]

df[cols_to_keep]     3  5 A  x  x B  x  x C  x  x

Or,

df.loc[:, cols_to_keep]     3  5 A  x  x B  x  x C  x  x

`DataFrame.reindex` with `axis=1` or `'columns'` (0.21+)

However, we also have reindex, in recent versions you specify axis=1 to drop:

df.reindex(cols_to_keep, axis=1) # df.reindex(cols_to_keep, axis='columns')  # for versions < 0.21, use # df.reindex(columns=cols_to_keep)     3  5 A  x  x B  x  x C  x  x

On older versions, you can also use reindex_axis: df.reindex_axis(cols_to_keep, axis=1).

`DataFrame.drop`

Another alternative is to use drop to select columns by pd.Index.difference:

# df.drop(cols_to_drop, axis=1) df.drop(df.columns.difference(cols_to_keep), axis=1)     3  5 A  x  x B  x  x C  x  x

Performance

enter image description here

The methods are roughly the same in terms of performance; reindex is faster for smaller N, while drop is faster for larger N. The performance is relative as the Y-axis is logarithmic.

Setup and Code

import pandas as pd import perfplot  def make_sample(n):     np.random.seed(0)     df = pd.DataFrame(np.full((n, n), 'x'))     cols_to_keep = np.random.choice(df.columns, max(2, n // 4), replace=False)      return df, cols_to_keep   perfplot.show(     setup=lambda n: make_sample(n),     kernels=[         lambda inp: inp[0][inp[1]],         lambda inp: inp[0].loc[:, inp[1]],         lambda inp: inp[0].reindex(inp[1], axis=1),         lambda inp: inp[0].drop(inp[0].columns.difference(inp[1]), axis=1)     ],     labels=['__getitem__', 'loc', 'reindex', 'drop'],     n_range=[2**k for k in range(2, 13)],     xlabel='N',        logy=True,     equality_check=lambda x, y: (x.reindex_like(y) == y).values.all() )

answered Oct 06 '22 05:10

cs95

Related questions
                            
                                How to zip two differently sized lists?
                            
                                Use tqdm with concurrent.futures?
                            
                                How do I get the UTC time of "midnight" for a given timezone?
                            
                                python pandas flatten a dataframe to a list
                            
                                inheritance from str or int
                            
                                How can tox install the modules via the requirements file?
                            
                                Multiple inheritance in python3 with different signatures
                            
                                Multiple constructors: the Pythonic way? [duplicate]
                            
                                Best way to loop over a python string backwards
                            
                                Do I need to pass the full path of a file in another directory to open()?
                            
                                How to write a custom decorator in django?
                            
                                Matplotlib: Plotting numerous disconnected line segments with different colors
                            
                                Python Selenium Chrome Webdriver [duplicate]
                            
                                Executing an SQL query over a pandas dataset
                            
                                Plotly chart not showing in Jupyter notebook
                            
                                What is the pythonic way to count the leading spaces in a string?
                            
                                running multiple bash commands with subprocess
                            
                                How to calculate correlation between all columns and remove highly correlated ones using pandas?
                            
                                Get all the diagonals in a matrix/list of lists in Python
                            
                                how to pass parameters of a function when using timeit.Timer()

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Keep certain columns in a pandas DataFrame, deleting everything else

Tags:

python

pandas

Matt

People also ask

2 Answers

Andy Hayden

How do I keep certain columns in a pandas DataFrame, deleting everything else?

`DataFrame.loc`

`DataFrame.reindex` with `axis=1` or `'columns'` (0.21+)

`DataFrame.drop`

Performance

cs95

Recent Activity

Donate For Us

Keep certain columns in a pandas DataFrame, deleting everything else

Tags:

python

pandas

Matt

People also ask

2 Answers

Andy Hayden

How do I keep certain columns in a pandas DataFrame, deleting everything else?

DataFrame.loc

DataFrame.reindex with axis=1 or 'columns' (0.21+)

DataFrame.drop

Performance

cs95

Related questions

Recent Activity

Donate For Us

`DataFrame.loc`

`DataFrame.reindex` with `axis=1` or `'columns'` (0.21+)

`DataFrame.drop`