I want to use a boolean to select the columns with more than 4000 entries from a dataframe <code>comb</code> which has over 1,000 columns. This expression gives me a Boolean (True/False) result: <pre class="prettyprint"><code>criteria = comb.ix[:,'c_0327':].count()>4000 </code></pre> I want to use it to select only the <code>True</code> columns to a new Dataframe. The following just gives me "Unalignable boolean Series key provided": <pre class="prettyprint"><code>comb.loc[criteria,] </code></pre> I also tried: <pre class="prettyprint"><code>comb.ix[:, comb.ix[:,'c_0327':].count()>4000] </code></pre> Similar to this question answer dataframe boolean selection along columns instead of row but that gives me the same error: "Unalignable boolean Series key provided" <pre class="prettyprint"><code>comb.ix[:,'c_0327':].count()>4000 </code></pre> yields: <pre class="prettyprint"><code>c_0327 False c_0328 False c_0329 False c_0330 False c_0331 False c_0332 False c_0333 False c_0334 False c_0335 False c_0336 False c_0337 True c_0338 False ..... </code></pre>

What is returned is a Series with the column names as the index and the boolean values as the row values. I think actually you want: this should now work: <pre class="prettyprint"><code>comb[criteria.index[criteria]] </code></pre> Basically this uses the index values from criteria and the boolean values to mask them, this will return an array of column names, we can use this to select the columns of interest from the orig df.

In pandas 0.25: <pre class="prettyprint"><code>comb.loc[:, criteria] </code></pre> Returns a DataFrame with columns selected by the Boolean list or Series. For multiple criteria: <pre class="prettyprint"><code>comb.loc[:, criteria1 & criteria2] </code></pre> And for selecting rows with an index criteria: <pre class="prettyprint"><code>comb[criteria] </code></pre> <hr> Note: The bit-wise operator <code>&</code> is required (not <code>and</code>). See Logical operators for boolean indexing in Pandas. Other Note: If the criteria is an expression (e.g., <code>comb.columnX > 3</code>), and multiple criteria are used, remember to enclose each expression in parentheses! This is because <code>&, |</code> have higher precedence than <code>>, ==, ect.</code> (whereas <code>and, or</code> are lower precedence).

Pandas Select DataFrame columns using boolean

Tags:

python

pandas

I want to use a boolean to select the columns with more than 4000 entries from a dataframe comb which has over 1,000 columns. This expression gives me a Boolean (True/False) result:

Click to copy

criteria = comb.ix[:,'c_0327':].count()>4000

I want to use it to select only the True columns to a new Dataframe.
The following just gives me "Unalignable boolean Series key provided":

Click to copy

comb.loc[criteria,]

I also tried:

Click to copy

comb.ix[:, comb.ix[:,'c_0327':].count()>4000]

Similar to this question answer dataframe boolean selection along columns instead of row but that gives me the same error: "Unalignable boolean Series key provided"

Click to copy

comb.ix[:,'c_0327':].count()>4000

yields:

Click to copy

c_0327    False c_0328    False c_0329    False c_0330    False c_0331    False c_0332    False c_0333    False c_0334    False c_0335    False c_0336    False c_0337     True c_0338    False .....

823

asked Mar 26 '15 15:03

dartdog

2 Answers

What is returned is a Series with the column names as the index and the boolean values as the row values.

I think actually you want:

this should now work:

Click to copy

comb[criteria.index[criteria]]

Basically this uses the index values from criteria and the boolean values to mask them, this will return an array of column names, we can use this to select the columns of interest from the orig df.

110

answered Oct 04 '22 16:10

EdChum

In pandas 0.25:

Click to copy

comb.loc[:, criteria]

Returns a DataFrame with columns selected by the Boolean list or Series.

For multiple criteria:

Click to copy

comb.loc[:, criteria1 & criteria2]

And for selecting rows with an index criteria:

Click to copy

comb[criteria]

Note: The bit-wise operator & is required (not and). See Logical operators for boolean indexing in Pandas.

Other Note: If the criteria is an expression (e.g., comb.columnX > 3), and multiple criteria are used, remember to enclose each expression in parentheses! This is because &, | have higher precedence than >, ==, ect. (whereas and, or are lower precedence).

answered Oct 04 '22 15:10

johnDanger

Related questions
                            
                                Python- How to flush the log? (django)
                            
                                Python List Comprehension and 'not in'
                            
                                File read using "open()" vs "with open()" [duplicate]
                            
                                Python - how to run multiple coroutines concurrently using asyncio?
                            
                                How to set/get pandas.DataFrame to/from Redis?
                            
                                Python requests: URL base in Session
                            
                                attribute 'tzinfo' of 'datetime.datetime' objects is not writable
                            
                                Building lxml for Python 2.7 on Windows
                            
                                Only index needed: enumerate or (x)range?
                            
                                how to initialize time() object in python
                            
                                How can a shell function know if it is running within a virtualenv?
                            
                                Cache entry deserialization failed, entry ignored
                            
                                Iterating over arrays in Python 3
                            
                                Django: "TypeError: [] is not JSON serializable" Why?
                            
                                Reading binary data from stdin
                            
                                Python reverse-stride slicing
                            
                                How to check if the current time is in range in python?
                            
                                How to write python lambda with multiple lines? [duplicate]
                            
                                ImportError: No module named flask.ext.login
                            
                                drop_all() freezes in Flask with SQLAlchemy

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Pandas Select DataFrame columns using boolean

Tags:

python

pandas

dartdog

People also ask

2 Answers

EdChum

johnDanger

Recent Activity

Donate For Us