Why does pandas DataFrame's [] (__getitem__) sometimes select columns, sometimes rows?

Question

Given this data frame:

In [40]: df = pd.DataFrame({'A': [1, 1], 'B': [2, 2], 'C': [3, 3]})

In [41]: df
Out[41]:
   A  B  C
0  1  2  3
1  1  2  3

If I pass a list of strings to [], it will filter columns:

In [42]: df[['A', 'C']]
Out[42]:
   A  C
0  1  3
1  1  3

But if I pass a list of booleans to [], it will filter rows:

In [45]: df[[True, False]]
Out[45]:
   A  B  C
0  1  2  3

Is there a way to think about this difference, rather than "it's just the way it is"?

Andy Hayden · Accepted Answer

My understanding is that this copied R's behavior to make migrating R scripts easier, it also started with ix to which is deprecated. There was a lot of ways to to do slicing, but we have fewer now:

single item, get a column.
list of columns, get a "subframe"
boolean indexing

Personally I like to use __getitem__ for all of those:

In [11]: df[['A', 'C']]
Out[11]:
   A  C
0  1  3
1  1  3

In [12]: df['A']
Out[12]:
0    1
1    1
Name: A, dtype: int64

The alternative, though it has less ambiguity (loc (or iloc) is too verbose:

In [13]: df.loc[:, ['A', 'B']]
Out[13]:
   A  B
0  1  2
1  1  2

In [14]: df.loc[:, 'A']
Out[14]:
0    1
1    1
Name: A, dtype: int64

It's worth noting that boolean masking is not ambiguous, unless you have an esoteric example where the boolean columns and the input length matches that of the DataFrame:

In [21]: df1 = pd.DataFrame({True: [1, 2], False: [3, 4]})

In [22]: df1
Out[22]:
   False  True
0      3      1
1      4      2

In [23]: df1[[True, False]]  # boolean slicing (not as column names)
Out[23]:
   False  True
0      3      1

Historically, there was potential ambiguity in ix (as well as performance issues - there's a lot of possible paths to take). So as well as removing ambiguity, the move to loc and iloc also led to faster code (generally use iloc if you can it's the fastest).

Why does pandas DataFrame's [] (getitem) sometimes select columns, sometimes rows?

Tags:

python

pandas

Heisenberg

1 Answers

Andy Hayden

Recent Activity

Donate For Us