In the data I am working with the index is compound - i.e. it has both item name and a timestamp, e.g. [email protected]|2013-05-07 05:52:51 +0200
.
I want to do hierarchical indexing, so that the same e-mails are grouped together, so I need to convert a DataFrame Index into a MultiIndex (e.g. for the entry above - ([email protected], 2013-05-07 05:52:51 +0200)
).
What is the most convenient method to do so?
pandas MultiIndex to ColumnsUse pandas DataFrame. reset_index() function to convert/transfer MultiIndex (multi-level index) indexes to columns. The default setting for the parameter is drop=False which will keep the index values as columns and set the new index to DataFrame starting from zero.
To make the column an index, we use the Set_index() function of pandas. If we want to make one column an index, we can simply pass the name of the column as a string in set_index(). If we want to do multi-indexing or Hierarchical Indexing, we pass the list of column names in the set_index().
Creating a MultiIndex (hierarchical index) object A MultiIndex can be created from a list of arrays (using MultiIndex. from_arrays() ), an array of tuples (using MultiIndex. from_tuples() ), a crossed set of iterables (using MultiIndex. from_product() ), or a DataFrame (using MultiIndex.
In this example, we will be creating multi-index from dataframe using pandas. We will be creating manual data and then using pd. dataframe, we will create a dataframe with the set of data. Now using the Multi-index syntax we will create a multi-index with a dataframe.
Once we have a DataFrame
import pandas as pd
df = pd.read_csv("input.csv", index_col=0) # or from another source
and a function mapping each index to a tuple (below, it is for the example from this question)
def process_index(k):
return tuple(k.split("|"))
we can create a hierarchical index in the following way:
df.index = pd.MultiIndex.from_tuples([process_index(k) for k,v in df.iterrows()])
An alternative approach is to create two columns then set them as the index (the original index will be dropped):
df['e-mail'] = [x.split("|")[0] for x in df.index]
df['date'] = [x.split("|")[1] for x in df.index]
df = df.set_index(['e-mail', 'date'])
or even shorter
df['e-mail'], df['date'] = zip(*map(process_index, df.index))
df = df.set_index(['e-mail', 'date'])
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With