I've noticed a strange difference between loc
and ix
when subsetting a DataFrame in Pandas.
import pandas as pd
# Create a dataframe
df = pd.DataFrame({'id':[10,9,5,6,8], 'x1':[10.0,12.3,13.4,11.9,7.6], 'x2':['a','a','b','c','c']})
df.set_index('id', inplace=True)
df
x1 x2
id
10 10.0 a
9 12.3 a
5 13.4 b
6 11.9 c
8 7.6 c
df.loc[[10, 9, 7]] # 7 does not exist in the index so a NaN row is returned
df.loc[[7]] # KeyError: 'None of [[7]] are in the [index]'
df.ix[[7]] # 7 does not exist in the index so a NaN row is returned
Why does df.loc[[7]]
throw an error while df.ix[[7]]
returns a row with NaN? Is this a bug? If not, why are loc
and ix
designed this way?
(Note I'm using Pandas 0.17.1 on Python 3.5.1)
As @shanmuga says, this is (at least for loc
) the intended and documented behaviour, and not a bug.
The documentation on loc
/selection by label, gives the rules on this (http://pandas.pydata.org/pandas-docs/stable/indexing.html#selection-by-label ):
At least 1 of the labels for which you ask, must be in the index or a KeyError will be raised!
This means using loc
with a single label (eg df.loc[[7]]
) will raise an error if this label is not in the index, but when using it with a list of labels (eg df.loc[[7,8,9]]
) will not raise an error if at least one of those labels is in the index.
For ix
I am less sure, and this is not clearly documented I think. But in any case, ix
is much more permissive and has a lot of edge cases (fallback to integer position etc), and is rather a rabbit hole. But in general, ix
will always return a result indexed with the provided labels (so does not check if the labels are in the index as loc
does), unless it falls back to integer position indexing.
In most cases it is advised to use loc
/iloc
I think this behavior is intended, not a bug.
Although I couldn't find any official documentation, I found a comment by jreback on 21 Mar 2014 to issue on GitHub indicating this.
ix can very subtly give wrong results (use an index of say even numbers)
you can use whatever function you want; ix is still there, but it doesn't provide the guarantees that loc provides, namely that it won't interpret a number as a location
As for why it is designed so
As mentioned in docs
.ix supports mixed integer and label based access. It is primarily label based, but will fall back to integer positional access unless the corresponding axis is of integer type.
In my opinion raising a KeyError
would be ambiguous as whether it it came from index, or integer position. Instead ix
returns NaN
when given a list
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With