In pandas you can replace the default integer-based index with an index made up of any number of columns using set_index()
.
What confuses me, though, is when you would want to do this. Regardless of whether the series is a column or part of the index, you can filter values in the series using boolean indexing for columns, or xs() for rows. You can sort on the columns or index using either sort_values()
or sort_index()
.
The only real difference I've encountered is that indexes have issues when there are duplicate values, so it seems that using an index is more restrictive, if anything.
Why then, would I want to convert my columns into an index in Pandas?
The iloc() function in python is defined in the Pandas module that helps us to select a specific row or column from the data set. Using the iloc method in python, we can easily retrieve any particular value from a row or column by using index values.
set_index() is used to set a new index to the DataFrame. It is also used to extend the existing DataFrame, i.e., we can update the index by append to the existing index. We need to use the append parameter of the DataFrame. set_index() function to append the new index to the existing one.
By default, the “index” is the range of numbers starting at zero. If you don't explicitly define an index when you create your DataFrame, then by default, Pandas will create an index for the DataFrame.
The main distinction between loc and iloc is: loc is label-based, which means that you have to specify rows and columns based on their row and column labels. iloc is integer position-based, so you have to specify rows and columns by their integer position values (0-based integer position).
In my opinion custom indexes are good for quickly selecting data.
They're also useful for aligning data for mapping, for aritmetic operations where the index is used for data alignment, for joining data, and for getting minimal or maximal rows per group.
DatetimeIndex
is nice for partial string indexing, for resampling.
But you are right, a duplicate index is problematic, especially for reindexing.
Docs:
- Identifies data (i.e. provides metadata) using known indicators, important for analysis, visualization, and interactive console display
- Enables automatic and explicit data alignment
- Allows intuitive getting and setting of subsets of the data set
Also you can check Modern pandas - Indexes, direct link.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With