Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does pandas DataFrame's [] (__getitem__) sometimes select columns, sometimes rows?

Tags:

python

pandas

Given this data frame:

In [40]: df = pd.DataFrame({'A': [1, 1], 'B': [2, 2], 'C': [3, 3]})

In [41]: df
Out[41]:
   A  B  C
0  1  2  3
1  1  2  3

If I pass a list of strings to [], it will filter columns:

In [42]: df[['A', 'C']]
Out[42]:
   A  C
0  1  3
1  1  3

But if I pass a list of booleans to [], it will filter rows:

In [45]: df[[True, False]]
Out[45]:
   A  B  C
0  1  2  3

Is there a way to think about this difference, rather than "it's just the way it is"?

like image 604
Heisenberg Avatar asked Nov 07 '22 14:11

Heisenberg


1 Answers

My understanding is that this copied R's behavior to make migrating R scripts easier, it also started with ix to which is deprecated. There was a lot of ways to to do slicing, but we have fewer now:

  1. single item, get a column.
  2. list of columns, get a "subframe"
  3. boolean indexing

Personally I like to use __getitem__ for all of those:

In [11]: df[['A', 'C']]
Out[11]:
   A  C
0  1  3
1  1  3

In [12]: df['A']
Out[12]:
0    1
1    1
Name: A, dtype: int64

The alternative, though it has less ambiguity (loc (or iloc) is too verbose:

In [13]: df.loc[:, ['A', 'B']]
Out[13]:
   A  B
0  1  2
1  1  2

In [14]: df.loc[:, 'A']
Out[14]:
0    1
1    1
Name: A, dtype: int64

It's worth noting that boolean masking is not ambiguous, unless you have an esoteric example where the boolean columns and the input length matches that of the DataFrame:

In [21]: df1 = pd.DataFrame({True: [1, 2], False: [3, 4]})

In [22]: df1
Out[22]:
   False  True
0      3      1
1      4      2

In [23]: df1[[True, False]]  # boolean slicing (not as column names)
Out[23]:
   False  True
0      3      1

Historically, there was potential ambiguity in ix (as well as performance issues - there's a lot of possible paths to take). So as well as removing ambiguity, the move to loc and iloc also led to faster code (generally use iloc if you can it's the fastest).

like image 134
Andy Hayden Avatar answered Nov 14 '22 20:11

Andy Hayden