Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Converting Index into MultiIndex (hierarchical index) in Pandas

Tags:

python

pandas

In the data I am working with the index is compound - i.e. it has both item name and a timestamp, e.g. [email protected]|2013-05-07 05:52:51 +0200.

I want to do hierarchical indexing, so that the same e-mails are grouped together, so I need to convert a DataFrame Index into a MultiIndex (e.g. for the entry above - ([email protected], 2013-05-07 05:52:51 +0200)).

What is the most convenient method to do so?

like image 487
Piotr Migdal Avatar asked Jul 23 '13 19:07

Piotr Migdal


People also ask

How do I change the Multi-Level index in pandas?

pandas MultiIndex to ColumnsUse pandas DataFrame. reset_index() function to convert/transfer MultiIndex (multi-level index) indexes to columns. The default setting for the parameter is drop=False which will keep the index values as columns and set the new index to DataFrame starting from zero.

How do pandas use hierarchical indexes?

To make the column an index, we use the Set_index() function of pandas. If we want to make one column an index, we can simply pass the name of the column as a string in set_index(). If we want to do multi-indexing or Hierarchical Indexing, we pass the list of column names in the set_index().

How do I create a MultiIndex Dataframe?

Creating a MultiIndex (hierarchical index) object A MultiIndex can be created from a list of arrays (using MultiIndex. from_arrays() ), an array of tuples (using MultiIndex. from_tuples() ), a crossed set of iterables (using MultiIndex. from_product() ), or a DataFrame (using MultiIndex.

Can a Dataframe have 2 indexes?

In this example, we will be creating multi-index from dataframe using pandas. We will be creating manual data and then using pd. dataframe, we will create a dataframe with the set of data. Now using the Multi-index syntax we will create a multi-index with a dataframe.


1 Answers

Once we have a DataFrame

import pandas as pd
df = pd.read_csv("input.csv", index_col=0)  # or from another source

and a function mapping each index to a tuple (below, it is for the example from this question)

def process_index(k):
    return tuple(k.split("|"))

we can create a hierarchical index in the following way:

df.index = pd.MultiIndex.from_tuples([process_index(k) for k,v in df.iterrows()])

An alternative approach is to create two columns then set them as the index (the original index will be dropped):

df['e-mail'] = [x.split("|")[0] for x in df.index] 
df['date'] = [x.split("|")[1] for x in df.index]
df = df.set_index(['e-mail', 'date'])

or even shorter

df['e-mail'], df['date'] = zip(*map(process_index, df.index))
df = df.set_index(['e-mail', 'date'])
like image 76
Piotr Migdal Avatar answered Sep 26 '22 06:09

Piotr Migdal