Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Make a Pandas MultiIndex from a product of iterables?

Tags:

python

pandas

I have a utility function for creating a Pandas MultiIndex when I have two or more iterables and I want an index key for each unique pairing of the values in those iterables. It looks like this

import pandas as pd
import itertools

def product_index(values, names=None):
    """Make a MultiIndex from the combinatorial product of the values."""
    iterable = itertools.product(*values)
    idx = pd.MultiIndex.from_tuples(list(iterable), names=names)
    return idx

And could be used like:

a = range(3)
b = list("ab")
product_index([a, b])

To create

MultiIndex(levels=[[0, 1, 2], [u'a', u'b']],
           labels=[[0, 0, 1, 1, 2, 2], [0, 1, 0, 1, 0, 1]])

This works perfectly fine, but it seems like a common usecase and I am surprised I had to implement it myself. So, my question is, what have I missed/misunderstood in the Pandas library itself that offers this functionality?

Edit to add: This function has been added to Pandas as MultiIndex.from_product for the 0.13.1 release.

like image 959
mwaskom Avatar asked Jan 23 '14 18:01

mwaskom


People also ask

How do I get rid of MultiIndex in pandas?

To drop multiple levels from a multi-level column index, use the columns. droplevel() repeatedly. We have used the Multiindex. from_tuples() is used to create indexes column-wise.

What does the pandas function MultiIndex From_tuples do?

from_tuples() function is used to convert list of tuples to MultiIndex. It is one of the several ways in which we construct a MultiIndex.


1 Answers

This is a very similar construction (but using cartesian_product which for larger arrays is faster than itertools.product)

In [2]: from pandas.tools.util import cartesian_product

In [3]: MultiIndex.from_arrays(cartesian_product([range(3),list('ab')]))
Out[3]: 
MultiIndex(levels=[[0, 1, 2], [u'a', u'b']],
           labels=[[0, 0, 1, 1, 2, 2], [0, 1, 0, 1, 0, 1]])

could be added as a convience method, maybe MultiIndex.from_iterables(...)

pls open an issue (and PR if you'd like)

FYI I very rarely actually construct a multi-index 'manually', almost always easier to actually construct a frame and just set_index.

In [10]: df = DataFrame(dict(A = np.arange(6), 
                             B = ['foo'] * 3 + ['bar'] * 3, 
                             C = np.ones(6)+np.arange(6)%2)
                       ).set_index(['C','B']).sortlevel()

In [11]: df
Out[11]: 
       A
C B     
1 bar  4
  foo  0
  foo  2
2 bar  3
  bar  5
  foo  1

[6 rows x 1 columns]
like image 156
Jeff Avatar answered Sep 22 '22 22:09

Jeff