all
Let's say there is a df with some column names - in my case the names are numeric values. For example, column named 1000, 1001, etc. I need to drop everything that doesn't pass a certain filter test - in my case, all columns with names less than a certain value. Lets say less than 1500...
I know how to do this directly (by listing every column), or by calling drop in a loop, but it seems very inefficient. I'm having syntax difficulties expressing it..
I have tried something like this:
df.drop(df.columns[x for x in df.columns.values<str(1500)], axis=1))
or
df.drop(df.columns.values<str(1500)], axis=1)
but these are obviously wrong.
Please, advise! Thank you
I think the simpliest is create boolean mask and then select with loc:
df = pd.DataFrame(columns=range(10), index=[0]);
print (df)
0 1 2 3 4 5 6 7 8 9
0 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
#inverting boolean mask with ~
print (df.loc[:, ~(df.columns < 8)])
8 9
0 NaN NaN
print (df.columns >= 8)
[False False False False False False False False True True]
print (df.loc[:, df.columns >= 8])
8 9
0 NaN NaN
What is same as drop
by filtered column names:
print (df.columns[df.columns < 8])
Int64Index([0, 1, 2, 3, 4, 5, 6, 7], dtype='int64')
print (df.drop(df.columns[df.columns < 8], axis=1))
8 9
0 NaN NaN
Consider a dataframe with column names 0 to 99.
0 1 2 3 4 5 6 7 8 9 ... 90 91 92 93 94 95 96 97 98 99
0 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
If you want to drop the column names less 30,
df = df.drop((x for x in df.columns.tolist() if x < 30), axis = 1)
returns
30 31 32 33 34 35 36 37 38 39 ... 90 91 92 93 94 95 96 97 98 99
0 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
If your columns are of the type object, convert them first using
df.columns = df.columns.astype(np.int64)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With