Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why PerformanceWarning when indexed lookup on sorted index?

Tags:

python

pandas

Does anyone know why this gives a PerformanceWarning?

d=pd.DataFrame(
    [
        [1,2,3],
        [1,2,4],
        [1,None,5],
        [2,3,5],
    ],
    columns=['i','j','k']
)
print d.dtypes
d = d.set_index(['i','j'])['k']
d = d.sort_index()

print d.loc[(2,3)] #  PerformanceWarning: indexing past lexsort depth may impact performance.

My understanding from the docs is that the PerformanceWarning follows from not sorting the index (the index was sorted).

like image 402
user48956 Avatar asked Feb 19 '18 05:02

user48956


People also ask

What is multiple indexing pandas?

The MultiIndex object is the hierarchical analogue of the standard Index object which typically stores the axis labels in pandas objects. You can think of MultiIndex as an array of tuples where each tuple is unique. A MultiIndex can be created from a list of arrays (using MultiIndex.

How do I create a hierarchical index in pandas?

To make the column an index, we use the Set_index() function of pandas. If we want to make one column an index, we can simply pass the name of the column as a string in set_index(). If we want to do multi-indexing or Hierarchical Indexing, we pass the list of column names in the set_index().


1 Answers

It turns out this is an open bug:

  • https://github.com/pandas-dev/pandas/issues/19771
  • https://github.com/pandas-dev/pandas/issues/17931

@ayhan's comment provides workaround:

d = d.sort_index(level=d.index.names)

which is should be the default behavior of:

d = d.sort_index()
like image 112
user48956 Avatar answered Oct 20 '22 02:10

user48956