Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Deleting multiple columns based on column names in Pandas

Tags:

python

pandas

I have some data and when I import it, I get the following unneeded columns. I'm looking for an easy way to delete all of these.

'Unnamed: 24', 'Unnamed: 25', 'Unnamed: 26', 'Unnamed: 27',
'Unnamed: 28', 'Unnamed: 29', 'Unnamed: 30', 'Unnamed: 31',
'Unnamed: 32', 'Unnamed: 33', 'Unnamed: 34', 'Unnamed: 35',
'Unnamed: 36', 'Unnamed: 37', 'Unnamed: 38', 'Unnamed: 39',
'Unnamed: 40', 'Unnamed: 41', 'Unnamed: 42', 'Unnamed: 43',
'Unnamed: 44', 'Unnamed: 45', 'Unnamed: 46', 'Unnamed: 47',
'Unnamed: 48', 'Unnamed: 49', 'Unnamed: 50', 'Unnamed: 51',
'Unnamed: 52', 'Unnamed: 53', 'Unnamed: 54', 'Unnamed: 55',
'Unnamed: 56', 'Unnamed: 57', 'Unnamed: 58', 'Unnamed: 59',
'Unnamed: 60'

They are indexed by 0-indexing so I tried something like

df.drop(df.columns[[22, 23, 24, 25, 
26, 27, 28, 29, 30, 31, 32 ,55]], axis=1, inplace=True)

But this isn't very efficient. I tried writing some for loops but this struck me as bad Pandas behaviour. Hence i ask the question here.

I've seen some examples which are similar (Drop multiple columns in pandas) but this doesn't answer my question.

like image 381
Peadar Coyle Avatar asked Feb 16 '15 09:02

Peadar Coyle


People also ask

How do you delete multiple unwanted columns in Pandas?

You can delete one or multiple columns of a DataFrame. To delete or remove only one column from Pandas DataFrame, you can use either del keyword, pop() function or drop() function on the dataframe. To delete multiple columns from Pandas Dataframe, use drop() function on the dataframe.

How do I get rid of 3 columns in Pandas?

We can use Pandas drop() function to drop multiple columns from a dataframe. Pandas drop() is versatile and it can be used to drop rows of a dataframe as well. To use Pandas drop() function to drop columns, we provide the multiple columns that need to be dropped as a list.

How do I delete a range of columns in Pandas?

Pandas Drop Multiple Columns By Index You can use df. columns[[index1, index2, indexn]] to identify the list of column names in that index position and pass that list to the drop method. Note that an index is 0 based. Use 0 to delete the first column and 1 to delete the second column and so on.


4 Answers

By far the simplest approach is:

yourdf.drop(['columnheading1', 'columnheading2'], axis=1, inplace=True)
like image 176
Philipp Schwarz Avatar answered Oct 19 '22 17:10

Philipp Schwarz


I don't know what you mean by inefficient but if you mean in terms of typing it could be easier to just select the cols of interest and assign back to the df:

df = df[cols_of_interest]

Where cols_of_interest is a list of the columns you care about.

Or you can slice the columns and pass this to drop:

df.drop(df.ix[:,'Unnamed: 24':'Unnamed: 60'].head(0).columns, axis=1)

The call to head just selects 0 rows as we're only interested in the column names rather than data

update

Another method: It would be simpler to use the boolean mask from str.contains and invert it to mask the columns:

In [2]:
df = pd.DataFrame(columns=['a','Unnamed: 1', 'Unnamed: 1','foo'])
df

Out[2]:
Empty DataFrame
Columns: [a, Unnamed: 1, Unnamed: 1, foo]
Index: []

In [4]:
~df.columns.str.contains('Unnamed:')

Out[4]:
array([ True, False, False,  True], dtype=bool)

In [5]:
df[df.columns[~df.columns.str.contains('Unnamed:')]]

Out[5]:
Empty DataFrame
Columns: [a, foo]
Index: []
like image 23
EdChum Avatar answered Oct 19 '22 17:10

EdChum


My personal favorite, and easier than the answers I have seen here (for multiple columns):

df.drop(df.columns[22:56], axis=1, inplace=True)
like image 53
sheldonzy Avatar answered Oct 19 '22 18:10

sheldonzy


This is probably a good way to do what you want. It will delete all columns that contain 'Unnamed' in their header.

for col in df.columns:
    if 'Unnamed' in col:
        del df[col]
like image 21
knightofni Avatar answered Oct 19 '22 18:10

knightofni