I would like to drop a given column from a pandas dataframe IF all the values in the column is "0%".
my df:
data = {'UK': ['11%', '16%', '7%', '52%', '2%', '5%', '3%', '3%'],
        'US': ['0%', '0%', '0%', '0%', '0%', '0%', '0%', '0%'],
        'DE': ['11%', '16%', '7%', '52%', '2%', '5%', '3%', '3%'],
        'FR': ['11%', '16%', '7%', '52%', '2%', '5%', '3%', '3%']
        }
dummy_df = pd.DataFrame(data, 
                        index=    ['cat1','cat2','cat3','cat4','cat5','cat6','cat7','cat8'], 
                        columns=['UK', 'US', 'DE', 'FR'])
my code so far:
dummy_df.drop(dummy_df == '0%',inplace=True)
I get a value error:
ValueError: labels ['UK' 'US' 'DE' 'FR'] not contained in axis
                Dropping a Pandas column by its position (or index) can be done by using the . drop() method. The method allows you to access columns by their index position.
If we need to delete the first 'n' columns from a DataFrame, we can use DataFrame. iloc and the Python range() function to specify the columns' range to be deleted. We need to use the built-in function range() with columns parameter of DataFrame. drop() .
Use pandas. DataFrame. drop() method to delete/remove rows with condition(s).
In [186]: dummy_df.loc[:, ~(dummy_df == '0%').all()]
Out[186]:
       UK   DE   FR
cat1  11%  11%  11%
cat2  16%  16%  16%
cat3   7%   7%   7%
cat4  52%  52%  52%
cat5   2%   2%   2%
cat6   5%   5%   5%
cat7   3%   3%   3%
cat8   3%   3%   3%
Explanation:
The comparison with '0%' you already got, this gives the following dataframe:
In [182]: dummy_df == '0%'
Out[182]:
         UK    US     DE     FR
cat1  False  True  False  False
cat2  False  True  False  False
cat3  False  True  False  False
cat4  False  True  False  False
cat5  False  True  False  False
cat6  False  True  False  False
cat7  False  True  False  False
cat8  False  True  False  False
Now we want to know which columns has all Trues:
In [183]: (dummy_df == '0%').all()
Out[183]:
UK    False
US     True
DE    False
FR    False
dtype: bool
And finally, we can index with these boolean values (but taking the opposite with ~ as want don't want to select where this is True): dummy_df.loc[:, ~(dummy_df == '0%').all()].
Similarly, you can also do: dummy_df.loc[:, (dummy_df != '0%').any()] (selects columns where at least one value is not equal to '0%')
First get the columns where all values != '0%'
In [163]: cols = (dummy_df != '0%').any()
In [164]: cols
Out[164]:
UK     True
US    False
DE     True
FR     True
dtype: bool
Then call only cols columns which are True
In [165]: dummy_df[cols[cols].index]
Out[165]:
       UK   DE   FR
cat1  11%  11%  11%
cat2  16%  16%  16%
cat3   7%   7%   7%
cat4  52%  52%  52%
cat5   2%   2%   2%
cat6   5%   5%   5%
cat7   3%   3%   3%
cat8   3%   3%   3%
                        If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With