I currently have a dataframe consisting of columns with 1's and 0's as values, I would like to iterate through the columns and delete the ones that are made up of only 0's. Here's what I have tried so far: <pre class="prettyprint"><code>ones = [] zeros = [] for year in years: for i in range(0,599): if year[str(i)].values.any() == 1: ones.append(i) if year[str(i)].values.all() == 0: zeros.append(i) for j in ones: if j in zeros: zeros.remove(j) for q in zeros: del year[str(q)] </code></pre> In which years is a list of dataframes for the various years I am analyzing, ones consists of columns with a one in them and zeros is a list of columns containing all zeros. Is there a better way to delete a column based on a condition? For some reason I have to check whether the ones columns are in the zeros list as well and remove them from the zeros list to obtain a list of all the zero columns.

<pre class="prettyprint"><code>df.loc[:, (df != 0).any(axis=0)] </code></pre> <hr> Here is a break-down of how it works: <pre class="prettyprint"><code>In [74]: import pandas as pd In [75]: df = pd.DataFrame([[1,0,0,0], [0,0,1,0]]) In [76]: df Out[76]: 0 1 2 3 0 1 0 0 0 1 0 0 1 0 [2 rows x 4 columns] </code></pre> <code>df != 0</code> creates a boolean DataFrame which is True where <code>df</code> is nonzero: <pre class="prettyprint"><code>In [77]: df != 0 Out[77]: 0 1 2 3 0 True False False False 1 False False True False [2 rows x 4 columns] </code></pre> <code>(df != 0).any(axis=0)</code> returns a boolean Series indicating which columns have nonzero entries. (The <code>any</code> operation aggregates values along the 0-axis -- i.e. along the rows -- into a single boolean value. Hence the result is one boolean value for each column.) <pre class="prettyprint"><code>In [78]: (df != 0).any(axis=0) Out[78]: 0 True 1 False 2 True 3 False dtype: bool </code></pre> And <code>df.loc</code> can be used to select those columns: <pre class="prettyprint"><code>In [79]: df.loc[:, (df != 0).any(axis=0)] Out[79]: 0 2 0 1 0 1 0 1 [2 rows x 2 columns] </code></pre> <hr> To "delete" the zero-columns, reassign <code>df</code>: <pre class="prettyprint"><code>df = df.loc[:, (df != 0).any(axis=0)] </code></pre>

Here is an alternative way to use is <code>df.replace(0,np.nan).dropna(axis=1,how="all")</code> Compared with the solution of unutbu, this way is obviously slower: <pre class="prettyprint"><code>%timeit df.loc[:, (df != 0).any(axis=0)] 652 µs ± 5.7 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) %timeit df.replace(0,np.nan).dropna(axis=1,how="all") 1.75 ms ± 9.49 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) </code></pre>

How do I delete a column that contains only zeros in Pandas?

Tags:

python

pandas

I currently have a dataframe consisting of columns with 1's and 0's as values, I would like to iterate through the columns and delete the ones that are made up of only 0's. Here's what I have tried so far:

ones = [] zeros = [] for year in years:     for i in range(0,599):         if year[str(i)].values.any() == 1:             ones.append(i)         if year[str(i)].values.all() == 0:             zeros.append(i)     for j in ones:         if j in zeros:             zeros.remove(j)     for q in zeros:         del year[str(q)]

In which years is a list of dataframes for the various years I am analyzing, ones consists of columns with a one in them and zeros is a list of columns containing all zeros. Is there a better way to delete a column based on a condition? For some reason I have to check whether the ones columns are in the zeros list as well and remove them from the zeros list to obtain a list of all the zero columns.

616

asked Jan 16 '14 14:01

user2587593

2 Answers

df.loc[:, (df != 0).any(axis=0)]

Here is a break-down of how it works:

In [74]: import pandas as pd  In [75]: df = pd.DataFrame([[1,0,0,0], [0,0,1,0]])  In [76]: df Out[76]:     0  1  2  3 0  1  0  0  0 1  0  0  1  0  [2 rows x 4 columns]

df != 0 creates a boolean DataFrame which is True where df is nonzero:

In [77]: df != 0 Out[77]:         0      1      2      3 0   True  False  False  False 1  False  False   True  False  [2 rows x 4 columns]

(df != 0).any(axis=0) returns a boolean Series indicating which columns have nonzero entries. (The any operation aggregates values along the 0-axis -- i.e. along the rows -- into a single boolean value. Hence the result is one boolean value for each column.)

In [78]: (df != 0).any(axis=0) Out[78]:  0     True 1    False 2     True 3    False dtype: bool

And df.loc can be used to select those columns:

In [79]: df.loc[:, (df != 0).any(axis=0)] Out[79]:     0  2 0  1  0 1  0  1  [2 rows x 2 columns]

To "delete" the zero-columns, reassign df:

df = df.loc[:, (df != 0).any(axis=0)]

152

answered Oct 12 '22 12:10

unutbu

Here is an alternative way to use is

df.replace(0,np.nan).dropna(axis=1,how="all")

Compared with the solution of unutbu, this way is obviously slower:

%timeit df.loc[:, (df != 0).any(axis=0)] 652 µs ± 5.7 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)  %timeit df.replace(0,np.nan).dropna(axis=1,how="all") 1.75 ms ± 9.49 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

answered Oct 12 '22 13:10

Jeremy Z

Related questions
                            
                                List comprehension: Returning two (or more) items for each item
                            
                                How to add a custom CA Root certificate to the CA Store used by pip in Windows?
                            
                                How can I read the contents of an URL with Python?
                            
                                Check if object is file-like in Python
                            
                                How do I initialize a dictionary of empty lists in Python?
                            
                                OpenCV giving wrong color to colored images on loading
                            
                                Pycharm: run only part of my Python file
                            
                                How to install PIP on Python 3.6?
                            
                                ValueError: numpy.ndarray size changed, may indicate binary incompatibility. Expected 88 from C header, got 80 from PyObject
                            
                                How do I use vi keys in ipython under *nix?
                            
                                How can you print a variable name in python? [duplicate]
                            
                                in Ipython notebook / Jupyter, Pandas is not displaying the graph I try to plot
                            
                                Django Rest Framework -- no module named rest_framework
                            
                                How to change the Spyder editor background to dark?
                            
                                Python dictionary get multiple values
                            
                                Does Flask support regular expressions in its URL routing?
                            
                                Sort a list of lists with a custom compare function
                            
                                Interleave multiple lists of the same length in Python
                            
                                How to force the Y axis to only use integers in Matplotlib? [duplicate]
                            
                                How to git commit nothing without an error?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With