Dataframe X:
A B C D
V1 V2 V3 V4
V1 V3 V4 V5
V1 V4 V5 V5
V1 V5 V9 V5
V1 V2 V3 V4
V1 V10 V11 V12
V1 V10 V6 V8
V1 V12 V7 V8
Here Col A has 1 unique value, Col B has 6 unique values, Col C has 7 unique values, Col D has 4 unique values.
I need a list of all columns where unique values > 4 say.
X.columns[(X.nunique() > 4).any()]
I expect to get only col B and Col C here, but I get all columns. How to achieve desired output.
You are really close, only remove .any
for boolean mask:
c = X.columns[(X.nunique() > 4)]
print (c)
Index(['B', 'C'], dtype='object')
If need select columns use DataFrame.loc
:
df = X.loc[:, (X.nunique() > 4)]
print (df)
B C
0 V2 V3
1 V3 V4
2 V4 V5
3 V5 V9
4 V2 V3
5 V10 V11
6 V10 V6
7 V12 V7
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With