I want to use a boolean to select the columns with more than 4000 entries from a dataframe comb
which has over 1,000 columns. This expression gives me a Boolean (True/False) result:
criteria = comb.ix[:,'c_0327':].count()>4000
I want to use it to select only the True
columns to a new Dataframe.
The following just gives me "Unalignable boolean Series key provided":
comb.loc[criteria,]
I also tried:
comb.ix[:, comb.ix[:,'c_0327':].count()>4000]
Similar to this question answer dataframe boolean selection along columns instead of row but that gives me the same error: "Unalignable boolean Series key provided"
comb.ix[:,'c_0327':].count()>4000
yields:
c_0327 False c_0328 False c_0329 False c_0330 False c_0331 False c_0332 False c_0333 False c_0334 False c_0335 False c_0336 False c_0337 True c_0338 False .....
Pandas DataFrame bool() MethodThe bool() method returns a boolean value, True or False, reflecting the value of the DataFrame. This method will only work if the DataFrame has only 1 value, and that value must be either True or False, otherwise the bool() method will return an error.
Selecting columns based on their name This is the most basic way to select a single column from a dataframe, just put the string name of the column in brackets. Returns a pandas series. Passing a list in the brackets lets you select multiple columns at the same time.
Boolean indexing helps us to select the data from the DataFrames using a boolean vector. We need a DataFrame with a boolean index to use the boolean indexing.
To filter DataFrames with Boolean Masks we use the index operator and pass a comparison for a specific column. In the example below, pandas will filter all rows for sales greater than 1000.
Rows represents the records/ tuples and columns refers to the attributes. We can create the DataFrame by using pandas.DataFrame () method. We can also create a DataFrame using dictionary by skipping columns and indices.
Boolean Indexing in Pandas. In boolean indexing, we will select subsets of data based on the actual values of the data in the DataFrame and not on their row/column labels or integer locations. In boolean indexing, we use a boolean vector to filter the data.
Using iloc to Select Columns The iloc function is one of the primary way of selecting data in Pandas. The method “iloc” stands for integer location indexing, where rows and columns are selected using their integer positions. This method is great for:
What is returned is a Series with the column names as the index and the boolean values as the row values.
I think actually you want:
this should now work:
comb[criteria.index[criteria]]
Basically this uses the index values from criteria and the boolean values to mask them, this will return an array of column names, we can use this to select the columns of interest from the orig df.
In pandas 0.25:
comb.loc[:, criteria]
Returns a DataFrame with columns selected by the Boolean list or Series.
For multiple criteria:
comb.loc[:, criteria1 & criteria2]
And for selecting rows with an index criteria:
comb[criteria]
Note: The bit-wise operator &
is required (not and
). See Logical operators for boolean indexing in Pandas.
Other Note: If the criteria is an expression (e.g., comb.columnX > 3
), and multiple criteria are used, remember to enclose each expression in parentheses! This is because &, |
have higher precedence than >, ==, ect.
(whereas and, or
are lower precedence).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With