Pandas Python, select columns based on rows conditions

Tags:

I have a dataframe:

import pandas as pd
df = pd.DataFrame(np.random.randn(2, 4))
print(df)
          0         1         2         3
0  1.489198  1.329603  1.590124  1.123505
1  0.024017  0.581033  2.500397  0.156280

I want to select the columns which for there is at least one row with a value greater than 2. I tried the following, but it did not work as expected.

df[df.columns[df.iloc[(0,1)]>2]]

In this toy example my expected output would be:

       2
1.590124  
2.500397

759

asked Jun 30 '16 08:06

hans glick

2 Answers

Use gt and any to filter the df:

In [287]:
df.ix[:,df.gt(2).any()]

Out[287]:
          2
0  1.590124
1  2.500397

Here we use ix to select all rows, the first : and the next arg is a boolean mask of the columns that meet the condition:

In [288]:
df.gt(2)

Out[288]:
       0      1      2      3
0  False  False  False  False
1  False  False   True  False

In [289]:
df.gt(2).any()

Out[289]:
0    False
1    False
2     True
3    False
dtype: bool

In your example what you did was select the cell value for the first row and second column, you then tried to use this to mask the columns but this just returned the first column hence why it didn't work:

In [291]:
df.iloc[(0,1)]

Out[291]:
1.3296030000000001

In [293]:
df.columns[df.iloc[(0,1)]>2]

Out[293]:
'0'

136

answered Nov 12 '22 09:11

EdChum

Use mask created with df > 2 with any and then select columns by ix:

import pandas as pd
np.random.seed(18)
df = pd.DataFrame(np.random.randn(2, 4))
print(df)
          0         1         2         3
0  0.079428  2.190202 -0.134892  0.160518
1  0.442698  0.623391  1.008903  0.394249

print ((df>2).any())
0    False
1     True
2    False
3    False
dtype: bool

print (df.ix[:, (df>2).any()])
          1
0  2.190202
1  0.623391

EDIT by comment:

You can check your solution per partes:

It seems it works, but it always select second column (1, python count from 0) column if condition True:

print (df.iloc[(0,1)])
2.19020235741

print (df.iloc[(0,1)] > 2)
True

print (df.columns[df.iloc[(0,1)]>2])
1

print (df[df.columns[df.iloc[(0,1)]>2]])
0    2.190202
1    0.623391
Name: 1, dtype: float64

And first column (0) column if False, because boolean True and False are casted to 1 and 0:

np.random.seed(15)
df = pd.DataFrame(np.random.randn(2, 4))
print (df)
          0         1         2         3
0 -0.312328  0.339285 -0.155909 -0.501790
1  0.235569 -1.763605 -1.095862 -1.087766

print (df.iloc[(0,1)])
0.339284706046

print (df.iloc[(0,1)] > 2)
False

print (df.columns[df.iloc[(0,1)]>2])
0

print (df[df.columns[df.iloc[(0,1)]>2]])
0   -0.312328
1    0.235569
Name: 0, dtype: float64

If change column names:

np.random.seed(15)
df = pd.DataFrame(np.random.randn(2, 4))
df.columns = ['a','b','c','d']
print (df)
          a         b         c         d
0 -0.312328  0.339285 -0.155909 -0.501790
1  0.235569 -1.763605 -1.095862 -1.087766

print (df.iloc[(0,1)] > 2)
False

print (df[df.columns[df.iloc[(0,1)]>2]])
0   -0.312328
1    0.235569
Name: a, dtype: float64

answered Nov 12 '22 09:11

jezrael

Related questions
                            
                                sklearn SVM fit() "ValueError: setting an array element with a sequence"
                            
                                Iterate over a dict except for x item items
                            
                                Flask app wrapped with DispatcherMiddleware no longer has test_client
                            
                                How to (properly) use external credentials in an AWS Lambda function?
                            
                                No handlers could be found for logger "__main__"
                            
                                Open BytesIO (xlsx) with xlrd
                            
                                Understanding Matplotlib's quiver plotting
                            
                                Dictionary Comprehension for list values
                            
                                How to assign and use column headers in Spark?
                            
                                Specifying default dtype for np.array(1.)
                            
                                How to erase line from text file in Python?
                            
                                How do you merge the master branch into a feature branch with GitPython?
                            
                                Add an autoincrementing ID column to an existing table with Sqlite
                            
                                How can I implement a recursive neural network in TensorFlow?
                            
                                DefaultRouter class not creating API root view for all apps in python
                            
                                Creating log directory in tensorboard
                            
                                How to download images from BeautifulSoup?
                            
                                Pandas - How to check if multi index column exists
                            
                                python if statement dictionary incompatible indexer with Series
                            
                                which tokenizer is better to be used with nltk

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Pandas Python, select columns based on rows conditions

Tags:

python

pandas

dataframe

conditional-statements

hans glick

People also ask

2 Answers

EdChum

jezrael

Recent Activity

Donate For Us