Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Get column names with distinct value greater than specified values python

Tags:

python

pandas

Dataframe X:

A   B    C    D
V1  V2   V3   V4
V1  V3   V4   V5
V1  V4   V5   V5
V1  V5   V9   V5
V1  V2   V3   V4
V1  V10  V11  V12
V1  V10  V6   V8
V1  V12  V7   V8

Here Col A has 1 unique value, Col B has 6 unique values, Col C has 7 unique values, Col D has 4 unique values.

I need a list of all columns where unique values > 4 say.

X.columns[(X.nunique() > 4).any()]

I expect to get only col B and Col C here, but I get all columns. How to achieve desired output.

like image 562
noob Avatar asked Mar 13 '20 09:03

noob


1 Answers

You are really close, only remove .any for boolean mask:

c = X.columns[(X.nunique() > 4)]
print (c)
Index(['B', 'C'], dtype='object')

If need select columns use DataFrame.loc:

df = X.loc[:, (X.nunique() > 4)]
print (df)
     B    C
0   V2   V3
1   V3   V4
2   V4   V5
3   V5   V9
4   V2   V3
5  V10  V11
6  V10   V6
7  V12   V7
like image 103
jezrael Avatar answered Sep 23 '22 00:09

jezrael