The difference between double brace `[[...]]` and single brace `[..]` indexing in Pandas

Question

I'm confused about the syntax regarding the following line of code:

x_values = dataframe[['Brains']]

The dataframe object consists of 2 columns (Brains and Bodies)

Brains Bodies
42     34
32     23

When I print x_values I get something like this:

Brains
0  42
1  32

I'm aware of the pandas documentation as far as attributes and methods of the dataframe object are concerned, but the double bracket syntax is confusing me.

MaxU - stop WAR against UA · Accepted Answer

Consider this:

Source DF:

In [79]: df
Out[79]:
   Brains  Bodies
0      42      34
1      32      23

Selecting one column - results in Pandas.Series:

In [80]: df['Brains']
Out[80]:
0    42
1    32
Name: Brains, dtype: int64

In [81]: type(df['Brains'])
Out[81]: pandas.core.series.Series

Selecting subset of DataFrame - results in DataFrame:

In [82]: df[['Brains']]
Out[82]:
   Brains
0      42
1      32

In [83]: type(df[['Brains']])
Out[83]: pandas.core.frame.DataFrame

Conclusion: the second approach allows us to select multiple columns from the DataFrame. The first one just for selecting single column...

Demo:

In [84]: df = pd.DataFrame(np.random.rand(5,6), columns=list('abcdef'))

In [85]: df
Out[85]:
          a         b         c         d         e         f
0  0.065196  0.257422  0.273534  0.831993  0.487693  0.660252
1  0.641677  0.462979  0.207757  0.597599  0.117029  0.429324
2  0.345314  0.053551  0.634602  0.143417  0.946373  0.770590
3  0.860276  0.223166  0.001615  0.212880  0.907163  0.437295
4  0.670969  0.218909  0.382810  0.275696  0.012626  0.347549

In [86]: df[['e','a','c']]
Out[86]:
          e         a         c
0  0.487693  0.065196  0.273534
1  0.117029  0.641677  0.207757
2  0.946373  0.345314  0.634602
3  0.907163  0.860276  0.001615
4  0.012626  0.670969  0.382810

and if we specify only one column in the list we will get a DataFrame with one column:

In [87]: df[['e']]
Out[87]:
          e
0  0.487693
1  0.117029
2  0.946373
3  0.907163
4  0.012626

SethMMorton · Answer

There is no special syntax in Python for [[ and ]]. Rather, a list is being created, and then that list is being passed as an argument to the DataFrame indexing function.

As per @MaxU's answer, if you pass a single string to a DataFrame a series that represents that one column is returned. If you pass a list of strings, then a DataFrame that contains the given columns is returned.

So, when you do the following

# Print "Brains" column as Series
print(df['Brains'])
# Return a DataFrame with only one column called "Brains"
print(df[['Brains']])

It is equivalent to the following

# Print "Brains" column as Series
column_to_get = 'Brains'
print(df[column_to_get])
# Return a DataFrame with only one column called "Brains"
subset_of_columns_to_get = ['Brains']
print(df[subset_of_columns_to_get])

In both cases, the DataFrame is being indexed with the [] operator.

Python uses the [] operator for both indexing and for constructing list literals, and ultimately I believe this is your confusion. The outer [ and ] in df[['Brains']] is performing the indexing, and the inner is creating a list.

>>> some_list = ['Brains']
>>> some_list_of_lists = [['Brains']]
>>> ['Brains'] == [['Brains']][0]
True
>>> 'Brains' == [['Brains']][0][0] == [['Brains'][0]][0]
True

What I am illustrating above is that at no point does Python ever see [[ and interpret it specially. In the last convoluted example ([['Brains'][0]][0]) there is no special ][ operator or ]][ operator... what happens is

A single-element list is created (['Brains'])
The first element of that list is indexed (['Brains'][0] => 'Brains')
That is placed into another list ([['Brains'][0]] => ['Brains'])
And then the first element of that list is indexed ([['Brains'][0]][0] => 'Brains')

mangal pavan · Answer

[ ] and [[ ]] are the concept of NumPy.

Try to understand the basics of np.array creating and use reshape and check with ndim, you'll understand. Check my answer here.

https://stackoverflow.com/a/70194733/7660981

jpp · Answer

Other solutions demonstrate the difference between a series and a dataframe. For the Mathematically minded, you may wish to consider the dimensions of your input and output. Here's a summary:

Object                                Series          DataFrame
Dimensions (obj.ndim)                      1                  2
Syntax arg dim                             0                  1
Syntax                             df['col']        df[['col']]
Max indexing dim                           1                  2
Label indexing              df['col'].loc[x]   df.loc[x, 'col']
Label indexing (scalar)      df['col'].at[x]    df.at[x, 'col']
Integer indexing           df['col'].iloc[x]  df.iloc[x, 'col']
Integer indexing (scalar)   df['col'].iat[x]   dfi.at[x, 'col']

When you specify a scalar or list argument to pd.DataFrame.__getitem__, for which [] is syntactic sugar, the dimension of your argument is one less than the dimension of your result. So a scalar (0-dimensional) gives a 1-dimensional series. A list (1-dimensional) gives a 2-dimensional dataframe. This makes sense since the additional dimension is the dataframe index, i.e. rows. This is the case even if your dataframe happens to have no rows.

The difference between double brace `[[...]]` and single brace `[..]` indexing in Pandas

Tags:

python

syntax

pandas

Mike Fellner

4 Answers

MaxU - stop WAR against UA

SethMMorton

mangal pavan

jpp

Recent Activity

Donate For Us

The difference between double brace `[[...]]` and single brace `[..]` indexing in Pandas

Tags:

python

syntax

pandas

Mike Fellner

4 Answers

MaxU - stop WAR against UA

SethMMorton

mangal pavan

jpp

Related questions

Recent Activity

Donate For Us