I want to select columns which have at least one value above the threshold. For example,
df = pd.DataFrame({'A': [randint(1, 9) for x in xrange(10)],
'B': [randint(1, 9)*10 for x in xrange(10)],
'C': [randint(1, 9)*100 for x in xrange(10)]})
df
A B C
0 9 40 300
1 9 70 700
2 5 70 900
3 8 80 900
4 7 50 200
5 9 30 900
6 2 80 700
7 2 80 400
8 5 80 300
9 7 70 800
Lets say I want to select columns which contain at least one value >70. In this case I'd expect to see as an output the following dataframe
df
B C
0 40 300
1 70 700
2 70 900
3 80 900
4 50 200
5 30 900
6 80 700
7 80 400
8 80 300
9 70 800
The only solution I can come up with is to loop through every column, see if there are any values above a threshold(for example, using .any()) and then pass an array of columns to .filter()... but that feels like a very awkward solution. Is there a better way?
Use df.columns with any:
df[df.columns[(df>70).any()]]
Output:
B C
0 40 100
1 20 100
2 80 500
3 60 800
4 10 300
5 70 800
6 50 200
7 40 600
8 40 200
9 20 200
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With