I am raising this question for my self learning. As far as I know, followings are the different methods to remove columns in pandas dataframe. Option - 1: <pre class="prettyprint"><code>df=pd.DataFrame({'a':[1,2,3,4,5],'b':[6,7,8,9,10],'c':[11,12,13,14,15]}) del df['a'] </code></pre> Option - 2: <pre class="prettyprint"><code>df=pd.DataFrame({'a':[1,2,3,4,5],'b':[6,7,8,9,10],'c':[11,12,13,14,15]}) df=df.drop('a',1) </code></pre> Option - 3: <pre class="prettyprint"><code>df=pd.DataFrame({'a':[1,2,3,4,5],'b':[6,7,8,9,10],'c':[11,12,13,14,15]}) df=df[['b','c']] </code></pre> <ol> <li>What is the best approach among these? </li> <li>Any other approaches to achieve the same?</li> </ol>

Follow the doc: <blockquote> DataFrame is a 2-dimensional labeled data structure with columns of potentially different types. </blockquote> And <code>pandas.DataFrame.drop</code>: <blockquote> Drop specified labels from rows or columns. </blockquote> So, I think we should stick with <code>df.drop</code>. Why? I think the pros are: <ol> <li> It gives us more control of the remove action: <pre class="prettyprint"><code># This will return a NEW DataFrame object, leave the original `df` untouched. df.drop('a', axis=1) # This will modify the `df` inplace. **And return a `None`**. df.drop('a', axis=1, inplace=True) </code></pre> </li> <li>It can handle more complicated cases with it's args. E.g. with <code>level</code>, we can handle MultiIndex deletion. And with <code>errors</code>, we can prevent some bugs.</li> <li>It's a more unified and object oriented way.</li> </ol> <hr> And just like @jezrael noted in his answer: Option 1: Using key word <code>del</code> is a limited way. Option 3: And <code>df=df[['b','c']]</code> isn't even a deletion in essence. It first select data by indexing with <code>[]</code> syntax, then unbind the name <code>df</code> with the original DataFrame and bind it with the new one (i.e. <code>df[['b','c']]</code>).

The recommended way to delete a column or row in pandas dataframes is using drop. To delete a column, <pre class="prettyprint"><code>df.drop('column_name', axis=1, inplace=True) </code></pre> To delete a row, <pre class="prettyprint"><code>df.drop('row_index', axis=0, inplace=True) </code></pre> You can refer this post to see a detailed conversation about column delete approaches.

In my opinion the best is use 2. and 3. option, because first has limits - you can remove only one column and cannot use dot notation - <code>del df.a</code>. 3.solution is not deleting, but selecting and piRSquared create nice answer for multiple possible solutions with same idea.

What is the best way to remove columns in pandas

Tags:

python

pandas

dataframe

I am raising this question for my self learning. As far as I know, followings are the different methods to remove columns in pandas dataframe.

Option - 1:

df=pd.DataFrame({'a':[1,2,3,4,5],'b':[6,7,8,9,10],'c':[11,12,13,14,15]})
del df['a']

Option - 2:

df=pd.DataFrame({'a':[1,2,3,4,5],'b':[6,7,8,9,10],'c':[11,12,13,14,15]})
df=df.drop('a',1)

Option - 3:

df=pd.DataFrame({'a':[1,2,3,4,5],'b':[6,7,8,9,10],'c':[11,12,13,14,15]})
df=df[['b','c']]

What is the best approach among these?
Any other approaches to achieve the same?

646

asked Jul 04 '18 07:07

Mohamed Thasin ah

4 Answers

Follow the doc:

DataFrame is a 2-dimensional labeled data structure with columns of potentially different types.

And pandas.DataFrame.drop:

Drop specified labels from rows or columns.

So, I think we should stick with df.drop. Why? I think the pros are:

It gives us more control of the remove action:

# This will return a NEW DataFrame object, leave the original `df` untouched.
df.drop('a', axis=1)  
# This will modify the `df` inplace. **And return a `None`**.
df.drop('a', axis=1, inplace=True)

It can handle more complicated cases with it's args. E.g. with level, we can handle MultiIndex deletion. And with errors, we can prevent some bugs.
It's a more unified and object oriented way.

And just like @jezrael noted in his answer:

Option 1: Using key word del is a limited way.

Option 3: And df=df[['b','c']] isn't even a deletion in essence. It first select data by indexing with [] syntax, then unbind the name df with the original DataFrame and bind it with the new one (i.e. df[['b','c']]).

186

answered Oct 18 '22 23:10

YaOzI

The recommended way to delete a column or row in pandas dataframes is using drop.

To delete a column,

df.drop('column_name', axis=1, inplace=True)

To delete a row,

df.drop('row_index', axis=0, inplace=True)

You can refer this post to see a detailed conversation about column delete approaches.

answered Oct 18 '22 23:10

razmik

From a speed perspective, option 1 seems to be the best. Obviously, based on the other answers, that doesn't mean it's actually the best option.

In [52]: import timeit

In [53]: s1 = """
    ...: import pandas as pd
    ...: df=pd.DataFrame({'a':[1,2,3,4,5],'b':[6,7,8,9,10],'c':[11,12,13,14,15]})
    ...: del df['a']
    ...: """

In [54]: s2 = """
    ...: import pandas as pd
    ...: df=pd.DataFrame({'a':[1,2,3,4,5],'b':[6,7,8,9,10],'c':[11,12,13,14,15]})
    ...: df=df.drop('a',1)
    ...: """

In [55]: s3 = """
    ...: import pandas as pd
    ...: df=pd.DataFrame({'a':[1,2,3,4,5],'b':[6,7,8,9,10],'c':[11,12,13,14,15]})
    ...: df=df[['b','c']]
    ...: """

In [56]: timeit.timeit(stmt=s1, number=100000)
Out[56]: 53.37321400642395

In [57]: timeit.timeit(stmt=s2, number=100000)
Out[57]: 79.68139410018921

In [58]: timeit.timeit(stmt=s3, number=100000)
Out[58]: 76.25269913673401

answered Oct 19 '22 00:10

aydow

In my opinion the best is use 2. and 3. option, because first has limits - you can remove only one column and cannot use dot notation - del df.a.

3.solution is not deleting, but selecting and piRSquared create nice answer for multiple possible solutions with same idea.

answered Oct 18 '22 22:10

jezrael

Related questions
                            
                                Add class to Django label_tag() output
                            
                                copy.deepcopy vs pickle
                            
                                expanding (adding a row or column) a scipy.sparse matrix
                            
                                Alembic --autogenerate producing empty migration
                            
                                'is' operator behaves differently when comparing strings with spaces
                            
                                beautiful soup getting tag.id
                            
                                Index multiple, non-adjacent ranges in numpy
                            
                                Why does redefining a variable used in a generator give strange results? [duplicate]
                            
                                How to query a table, in sqlalchemy
                            
                                Python Curses Handling Window (Terminal) Resize
                            
                                Python: Create Dictionary from Text/File that's in Dictionary Format
                            
                                scraping the file with html saved in local system
                            
                                Reindexing pandas timeseries from object dtype to datetime dtype
                            
                                What is a namespace object?
                            
                                redirecting with url_for to a path with query params in flask
                            
                                How to set default colormap in Matplotlib
                            
                                How to combine multiple regex into single one in python?
                            
                                ValueError: Tensor must be from the same graph as Tensor with Bidirectinal RNN in Tensorflow
                            
                                How to merge 2 ordered dictionaries in python?
                            
                                MemoryError when I merge two Pandas data frames

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With