Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

When to use a custom index rather than ordinary columns in Pandas

Tags:

In pandas you can replace the default integer-based index with an index made up of any number of columns using set_index().

What confuses me, though, is when you would want to do this. Regardless of whether the series is a column or part of the index, you can filter values in the series using boolean indexing for columns, or xs() for rows. You can sort on the columns or index using either sort_values() or sort_index().

The only real difference I've encountered is that indexes have issues when there are duplicate values, so it seems that using an index is more restrictive, if anything.

Why then, would I want to convert my columns into an index in Pandas?

like image 595
Migwell Avatar asked Jun 20 '17 05:06

Migwell


People also ask

When should I use ILOC pandas?

The iloc() function in python is defined in the Pandas module that helps us to select a specific row or column from the data set. Using the iloc method in python, we can easily retrieve any particular value from a row or column by using index values.

What is the use of set index in pandas?

set_index() is used to set a new index to the DataFrame. It is also used to extend the existing DataFrame, i.e., we can update the index by append to the existing index. We need to use the append parameter of the DataFrame. set_index() function to append the new index to the existing one.

Does a pandas DataFrame have to have an index?

By default, the “index” is the range of numbers starting at zero. If you don't explicitly define an index when you create your DataFrame, then by default, Pandas will create an index for the DataFrame.

What is the difference between the use of ILOC and Loc?

The main distinction between loc and iloc is: loc is label-based, which means that you have to specify rows and columns based on their row and column labels. iloc is integer position-based, so you have to specify rows and columns by their integer position values (0-based integer position).


1 Answers

In my opinion custom indexes are good for quickly selecting data.

They're also useful for aligning data for mapping, for aritmetic operations where the index is used for data alignment, for joining data, and for getting minimal or maximal rows per group.

DatetimeIndex is nice for partial string indexing, for resampling.

But you are right, a duplicate index is problematic, especially for reindexing.

Docs:

  • Identifies data (i.e. provides metadata) using known indicators, important for analysis, visualization, and interactive console display
  • Enables automatic and explicit data alignment
  • Allows intuitive getting and setting of subsets of the data set

Also you can check Modern pandas - Indexes, direct link.

like image 99
jezrael Avatar answered Sep 21 '22 11:09

jezrael