Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas: Why are double brackets needed to select column after boolean indexing

For a df table like below,

   A B C D 0  0 1 1 1 1  2 3 5 7 3  3 1 2 8 

why are the double brackets needed for selecting specific columns after boolean indexing?

the [['A','C']] part of  df[df['A'] < 3][['A','C']] 
like image 697
FortuneFaded Avatar asked Oct 29 '15 15:10

FortuneFaded


People also ask

Why do we use double brackets in pandas?

Indexing DataFrames You can either use a single bracket or a double bracket. The single bracket will output a Pandas Series, while a double bracket will output a Pandas DataFrame. Square brackets can also be used to access observations (rows) from a DataFrame.

Why does pandas LOC use square brackets?

The square brackets are syntactic sugar for the special method __getitem__ . All objects can implement this method in their class definition and then subsequently work with the square brackets.

What is boolean indexing pandas?

Pandas boolean indexing is a standard procedure. We will select the subsets of data based on the actual values in the DataFrame and not on their row/column labels or integer locations. Pandas indexing operators “&” and “|” provide easy access to select values from Pandas data structures across various use cases.

Is boolean indexing possible in DataFrame?

Boolean indexing helps us to select the data from the DataFrames using a boolean vector. We need a DataFrame with a boolean index to use the boolean indexing.


1 Answers

For pandas objects (Series, DataFrame), the indexing operator [] only accepts

  1. colname or list of colnames to select column(s)
  2. slicing or Boolean array to select row(s), i.e. it only refers to one dimension of the dataframe.

For df[[colname(s)]], the interior brackets are for list, and the outside brackets are indexing operator, i.e. you must use double brackets if you select two or more columns. With one column name, single pair of brackets returns a Series, while double brackets return a dataframe.

Also, df.ix[df['A'] < 3,['A','C']] or df.loc[df['A'] < 3,['A','C']] is better than the chained selection for avoiding returning a copy versus a view of the dataframe.

Please refer pandas documentation for details

like image 167
Joseph Avatar answered Sep 25 '22 14:09

Joseph