I have a large pandas dataframe (>100 columns). I need to drop various sets of columns and i'm hoping there is a way of using the old
df.drop(df.columns['slices'],axis=1)
I've built selections such as:
a = df.columns[3:23]
b = df.colums[-6:]
as a
and b
represent column sets I want to drop.
The following
list(df)[3:23]+list(df)[-6:]
yields the correct selection, but i can't implement it with a drop
:
df.drop(df.columns[list(df)[3:23]+list(df)[-6:]],axis=1)
ValueError: operands could not be broadcast together with shapes (20,) (6,)
I looked around but can't get my answer.
Selecting last n columns and excluding last n columns in dataframe
(Below pertains to the error I receive):
python numpy ValueError: operands could not be broadcast together with shapes
This one feels like they're having a similar issue, but the 'slices' aren't separate: Deleting multiple columns based on column names in Pandas
Cheers
To drop multiple levels from a multi-level column index, use the columns. droplevel() repeatedly.
Use iloc to drop first column of pandas dataframe. Use drop() to remove first column of pandas dataframe. Use del keyword to remove first column of pandas dataframe. Use pop() to remove first column of pandas dataframe.
You can delete one or multiple columns of a DataFrame. To delete or remove only one column from Pandas DataFrame, you can use either del keyword, pop() function or drop() function on the dataframe. To delete multiple columns from Pandas Dataframe, use drop() function on the dataframe.
This returns the dataframe with the columns removed
df.drop(list(df)[2:5], axis=1)
You can use np.r_
to seamlessly combine multiple ranges / slices:
from string import ascii_uppercase
df = pd.DataFrame(columns=list(ascii_uppercase))
idx = np.r_[3:10, -5:0]
print(idx)
array([ 3, 4, 5, 6, 7, 8, 9, -5, -4, -3, -2, -1])
You can then use idx
to index your columns and feed to pd.DataFrame.drop
:
df.drop(df.columns[idx], axis=1, inplace=True)
print(df.columns)
Index(['A', 'B', 'C', 'K', 'L', 'M', 'N',
'O','P', 'Q', 'R', 'S', 'T', 'U'], dtype='object')
You can use this simple solution:
cols = [3,7,10,12,14,16,18,20,22]
df.drop(df.columns[cols],axis=1,inplace=True)
the result :
0 1 2 4 5 6 8 9 11 13 15 17 19 21
0 3 12 10 3 2 1 7 512 64 1024.0 -1.0 -1.0 -1.0 -1.0
1 5 12 10 3 2 1 7 16 2 32.0 32.0 1024.0 -1.0 -1.0
2 5 12 10 3 2 1 7 512 2 32.0 32.0 32.0 -1.0 -1.0
3 5 12 10 3 2 1 7 16 1 32.0 64.0 1024.0 -1.0 -1.0
As you can see the columns with given index have been all deleted.
You can replace the int value by the name of the column you have in your array if we suppose you have A,B,C ...etc you can replace int values in cols
like this for example :
cols = ['A','B','C','F']
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With