Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Selecting all columns with at least one value above threshold

Tags:

pandas

I want to select columns which have at least one value above the threshold. For example,

df = pd.DataFrame({'A': [randint(1, 9) for x in xrange(10)],
                   'B': [randint(1, 9)*10 for x in xrange(10)],
                   'C': [randint(1, 9)*100 for x in xrange(10)]})
df
   A   B    C
0  9  40  300
1  9  70  700
2  5  70  900
3  8  80  900
4  7  50  200
5  9  30  900
6  2  80  700
7  2  80  400
8  5  80  300
9  7  70  800

Lets say I want to select columns which contain at least one value >70. In this case I'd expect to see as an output the following dataframe

   df
       B    C
    0  40  300
    1  70  700
    2  70  900
    3  80  900
    4  50  200
    5  30  900
    6  80  700
    7  80  400
    8  80  300
    9  70  800

The only solution I can come up with is to loop through every column, see if there are any values above a threshold(for example, using .any()) and then pass an array of columns to .filter()... but that feels like a very awkward solution. Is there a better way?

like image 955
Rotkiv Avatar asked Oct 30 '25 15:10

Rotkiv


1 Answers

Use df.columns with any:

df[df.columns[(df>70).any()]]

Output:

    B    C
0  40  100
1  20  100
2  80  500
3  60  800
4  10  300
5  70  800
6  50  200
7  40  600
8  40  200
9  20  200
like image 171
Scott Boston Avatar answered Nov 02 '25 21:11

Scott Boston



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!