Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Converting tuples to multiple indices in a Pandas Dataframe

Tags:

python

pandas

I'm starting with a dictionary like this:

dict = {(100000550L, u'ActivityA'): {'bar__sum': 14.0, 'foo__sum': 12.0},
        (100001799L, u'ActivityB'): {'bar__sum': 7.0, 'foo__sum': 3.0}}

Which, when converted to a DataFrame, puts as column headers the tuples of (id, activitytype):

df = DataFrame(dict).transpose()

                        bar__sum  foo__sum
(100000550, ActivityA)        14        12
(100001799, ActivityB)         7         3

How can I convert the tuples in the index to a MultiIndex? Ie, so that the end result looks like this instead:

                        bar__sum  foo__sum
id        act_type
100000550 ActivityA        14        12
100001799 ActivityB         7         3

What's the best way to do this? Is there some option on the DataFrame creation that I'm missing? Or should it happen via a list comprehension, which feels inefficient to me.

like image 728
Twain Avatar asked Nov 22 '13 19:11

Twain


People also ask

How do I get multiple indexes in pandas?

Creating a MultiIndex (hierarchical index) object A MultiIndex can be created from a list of arrays (using MultiIndex. from_arrays() ), an array of tuples (using MultiIndex. from_tuples() ), a crossed set of iterables (using MultiIndex. from_product() ), or a DataFrame (using MultiIndex.

Can a DataFrame have multiple indexes?

A multi-index (also known as hierarchical index) dataframe uses more than one column as the index of the dataframe. A multi-index dataframe allows you to store your data in multi-dimension format, and opens up a lot of exciting to represent your data.

How do you convert a list of tuples to a pandas DataFrame?

To convert a Python tuple to DataFrame, use the pd. DataFrame() constructor that accepts a tuple as an argument and it returns a DataFrame.


1 Answers

If you want to convert index of your dataframe:

>>> df.index = pd.MultiIndex.from_tuples(df.index)
>>> df
                     bar__sum  foo__sum
100000550 ActivityA        14        12
100001799 ActivityB         7         3

>>> df.index.names = ['id', 'act_type']
>>> df
                     bar__sum  foo__sum
id        act_type                     
100000550 ActivityA        14        12
100001799 ActivityB         7         3

You can also create DataFrame directly from dictionary (d is your dict, don't call your variable dict since it'll shadow standard python dictionary):

>>> pd.DataFrame(d.values(), index=pd.MultiIndex.from_tuples(d.keys(), names=['id', 'act_type']))
                     bar__sum  foo__sum
id        act_type                     
100001799 ActivityB         7         3
100000550 ActivityA        14        12

Note that values() and keys() are always in the same order, so no worries about that.

like image 105
Roman Pekar Avatar answered Sep 24 '22 13:09

Roman Pekar