I currently have a dataframe consisting of columns with 1's and 0's as values, I would like to iterate through the columns and delete the ones that are made up of only 0's. Here's what I have tried so far:
ones = [] zeros = [] for year in years: for i in range(0,599): if year[str(i)].values.any() == 1: ones.append(i) if year[str(i)].values.all() == 0: zeros.append(i) for j in ones: if j in zeros: zeros.remove(j) for q in zeros: del year[str(q)]
In which years is a list of dataframes for the various years I am analyzing, ones consists of columns with a one in them and zeros is a list of columns containing all zeros. Is there a better way to delete a column based on a condition? For some reason I have to check whether the ones columns are in the zeros list as well and remove them from the zeros list to obtain a list of all the zero columns.
By default axis = 0 meaning to remove rows. Use axis=1 or columns param to remove columns. By default, pandas return a copy DataFrame after deleting rows, use inpalce=True to remove from existing referring DataFrame.
The Pandas drop() function in Python is used to drop specified labels from rows and columns. Drop is a major function used in data science & Machine Learning to clean the dataset. Pandas Drop() function removes specified labels from rows or columns.
df.loc[:, (df != 0).any(axis=0)]
Here is a break-down of how it works:
In [74]: import pandas as pd In [75]: df = pd.DataFrame([[1,0,0,0], [0,0,1,0]]) In [76]: df Out[76]: 0 1 2 3 0 1 0 0 0 1 0 0 1 0 [2 rows x 4 columns]
df != 0
creates a boolean DataFrame which is True where df
is nonzero:
In [77]: df != 0 Out[77]: 0 1 2 3 0 True False False False 1 False False True False [2 rows x 4 columns]
(df != 0).any(axis=0)
returns a boolean Series indicating which columns have nonzero entries. (The any
operation aggregates values along the 0-axis -- i.e. along the rows -- into a single boolean value. Hence the result is one boolean value for each column.)
In [78]: (df != 0).any(axis=0) Out[78]: 0 True 1 False 2 True 3 False dtype: bool
And df.loc
can be used to select those columns:
In [79]: df.loc[:, (df != 0).any(axis=0)] Out[79]: 0 2 0 1 0 1 0 1 [2 rows x 2 columns]
To "delete" the zero-columns, reassign df
:
df = df.loc[:, (df != 0).any(axis=0)]
Here is an alternative way to use is
df.replace(0,np.nan).dropna(axis=1,how="all")
Compared with the solution of unutbu, this way is obviously slower:
%timeit df.loc[:, (df != 0).any(axis=0)] 652 µs ± 5.7 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) %timeit df.replace(0,np.nan).dropna(axis=1,how="all") 1.75 ms ± 9.49 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With