Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Unexpected difference between loc and ix

Tags:

python

pandas

I've noticed a strange difference between loc and ix when subsetting a DataFrame in Pandas.

import pandas as pd

# Create a dataframe
df = pd.DataFrame({'id':[10,9,5,6,8], 'x1':[10.0,12.3,13.4,11.9,7.6], 'x2':['a','a','b','c','c']})
df.set_index('id', inplace=True)

df
      x1 x2
id         
10  10.0  a
9   12.3  a
5   13.4  b
6   11.9  c
8    7.6  c


df.loc[[10, 9, 7]] # 7 does not exist in the index so a NaN row is returned
df.loc[[7]] # KeyError: 'None of [[7]] are in the [index]'
df.ix[[7]] # 7 does not exist in the index so a NaN row is returned

Why does df.loc[[7]] throw an error while df.ix[[7]] returns a row with NaN? Is this a bug? If not, why are loc and ix designed this way?

(Note I'm using Pandas 0.17.1 on Python 3.5.1)

like image 308
Ben Avatar asked Dec 14 '15 04:12

Ben


2 Answers

As @shanmuga says, this is (at least for loc) the intended and documented behaviour, and not a bug.

The documentation on loc/selection by label, gives the rules on this (http://pandas.pydata.org/pandas-docs/stable/indexing.html#selection-by-label ):

At least 1 of the labels for which you ask, must be in the index or a KeyError will be raised!

This means using loc with a single label (eg df.loc[[7]]) will raise an error if this label is not in the index, but when using it with a list of labels (eg df.loc[[7,8,9]]) will not raise an error if at least one of those labels is in the index.


For ix I am less sure, and this is not clearly documented I think. But in any case, ix is much more permissive and has a lot of edge cases (fallback to integer position etc), and is rather a rabbit hole. But in general, ix will always return a result indexed with the provided labels (so does not check if the labels are in the index as loc does), unless it falls back to integer position indexing.
In most cases it is advised to use loc/iloc

like image 159
joris Avatar answered Oct 26 '22 14:10

joris


I think this behavior is intended, not a bug.
Although I couldn't find any official documentation, I found a comment by jreback on 21 Mar 2014 to issue on GitHub indicating this.

ix can very subtly give wrong results (use an index of say even numbers)

you can use whatever function you want; ix is still there, but it doesn't provide the guarantees that loc provides, namely that it won't interpret a number as a location


As for why it is designed so
As mentioned in docs

.ix supports mixed integer and label based access. It is primarily label based, but will fall back to integer positional access unless the corresponding axis is of integer type.

In my opinion raising a KeyError would be ambiguous as whether it it came from index, or integer position. Instead ix returns NaN when given a list

like image 45
shanmuga Avatar answered Oct 26 '22 13:10

shanmuga