I would like to drop a given column from a pandas dataframe IF all the values in the column is "0%".
my df:
data = {'UK': ['11%', '16%', '7%', '52%', '2%', '5%', '3%', '3%'],
'US': ['0%', '0%', '0%', '0%', '0%', '0%', '0%', '0%'],
'DE': ['11%', '16%', '7%', '52%', '2%', '5%', '3%', '3%'],
'FR': ['11%', '16%', '7%', '52%', '2%', '5%', '3%', '3%']
}
dummy_df = pd.DataFrame(data,
index= ['cat1','cat2','cat3','cat4','cat5','cat6','cat7','cat8'],
columns=['UK', 'US', 'DE', 'FR'])
my code so far:
dummy_df.drop(dummy_df == '0%',inplace=True)
I get a value error:
ValueError: labels ['UK' 'US' 'DE' 'FR'] not contained in axis
Dropping a Pandas column by its position (or index) can be done by using the . drop() method. The method allows you to access columns by their index position.
If we need to delete the first 'n' columns from a DataFrame, we can use DataFrame. iloc and the Python range() function to specify the columns' range to be deleted. We need to use the built-in function range() with columns parameter of DataFrame. drop() .
Use pandas. DataFrame. drop() method to delete/remove rows with condition(s).
In [186]: dummy_df.loc[:, ~(dummy_df == '0%').all()]
Out[186]:
UK DE FR
cat1 11% 11% 11%
cat2 16% 16% 16%
cat3 7% 7% 7%
cat4 52% 52% 52%
cat5 2% 2% 2%
cat6 5% 5% 5%
cat7 3% 3% 3%
cat8 3% 3% 3%
Explanation:
The comparison with '0%' you already got, this gives the following dataframe:
In [182]: dummy_df == '0%'
Out[182]:
UK US DE FR
cat1 False True False False
cat2 False True False False
cat3 False True False False
cat4 False True False False
cat5 False True False False
cat6 False True False False
cat7 False True False False
cat8 False True False False
Now we want to know which columns has all True
s:
In [183]: (dummy_df == '0%').all()
Out[183]:
UK False
US True
DE False
FR False
dtype: bool
And finally, we can index with these boolean values (but taking the opposite with ~
as want don't want to select where this is True
): dummy_df.loc[:, ~(dummy_df == '0%').all()]
.
Similarly, you can also do: dummy_df.loc[:, (dummy_df != '0%').any()]
(selects columns where at least one value is not equal to '0%')
First get the columns where all values != '0%'
In [163]: cols = (dummy_df != '0%').any()
In [164]: cols
Out[164]:
UK True
US False
DE True
FR True
dtype: bool
Then call only cols
columns which are True
In [165]: dummy_df[cols[cols].index]
Out[165]:
UK DE FR
cat1 11% 11% 11%
cat2 16% 16% 16%
cat3 7% 7% 7%
cat4 52% 52% 52%
cat5 2% 2% 2%
cat6 5% 5% 5%
cat7 3% 3% 3%
cat8 3% 3% 3%
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With