Does anyone know why this gives a PerformanceWarning?
d=pd.DataFrame(
[
[1,2,3],
[1,2,4],
[1,None,5],
[2,3,5],
],
columns=['i','j','k']
)
print d.dtypes
d = d.set_index(['i','j'])['k']
d = d.sort_index()
print d.loc[(2,3)] # PerformanceWarning: indexing past lexsort depth may impact performance.
My understanding from the docs is that the PerformanceWarning follows from not sorting the index (the index was sorted).
The MultiIndex object is the hierarchical analogue of the standard Index object which typically stores the axis labels in pandas objects. You can think of MultiIndex as an array of tuples where each tuple is unique. A MultiIndex can be created from a list of arrays (using MultiIndex.
To make the column an index, we use the Set_index() function of pandas. If we want to make one column an index, we can simply pass the name of the column as a string in set_index(). If we want to do multi-indexing or Hierarchical Indexing, we pass the list of column names in the set_index().
It turns out this is an open bug:
@ayhan's comment provides workaround:
d = d.sort_index(level=d.index.names)
which is should be the default behavior of:
d = d.sort_index()
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With