For a df table like below,
A B C D 0 0 1 1 1 1 2 3 5 7 3 3 1 2 8
why are the double brackets needed for selecting specific columns after boolean indexing?
the [['A','C']] part of df[df['A'] < 3][['A','C']]
Indexing DataFrames You can either use a single bracket or a double bracket. The single bracket will output a Pandas Series, while a double bracket will output a Pandas DataFrame. Square brackets can also be used to access observations (rows) from a DataFrame.
The square brackets are syntactic sugar for the special method __getitem__ . All objects can implement this method in their class definition and then subsequently work with the square brackets.
Pandas boolean indexing is a standard procedure. We will select the subsets of data based on the actual values in the DataFrame and not on their row/column labels or integer locations. Pandas indexing operators “&” and “|” provide easy access to select values from Pandas data structures across various use cases.
Boolean indexing helps us to select the data from the DataFrames using a boolean vector. We need a DataFrame with a boolean index to use the boolean indexing.
For pandas objects (Series, DataFrame), the indexing operator [] only accepts
colname
or list of colnames to select column(s) For df[[colname(s)]]
, the interior brackets are for list, and the outside brackets are indexing operator, i.e. you must use double brackets if you select two or more columns. With one column name, single pair of brackets returns a Series, while double brackets return a dataframe.
Also, df.ix[df['A'] < 3,['A','C']]
or df.loc[df['A'] < 3,['A','C']]
is better than the chained selection for avoiding returning a copy versus a view of the dataframe.
Please refer pandas documentation for details
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With