Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to create a multi-index in Pandas

Tags:

python

pandas

Question

There are two questions that look similar but they're not the same question: here and here. They both call a method of GroupBy, such as count() or aggregate(), which I know returns a DataFrame. What I'm asking is how to convert the GroupBy (class pandas.core.groupby.DataFrameGroupBy) object itself into a DataFrame. I'll illustrate below.

Example

Construct an example DataFrame as follows.

data_list = []
for name in ["sasha", "asa"]:
    for take in ["one", "two"]:
        row = {"name": name, "take": take, "score": numpy.random.rand(), "ping": numpy.random.randint(10, 100)}
        data_list.append(row)
data = pandas.DataFrame(data_list)

The above DataFrame should look like the following (with different numbers obviously).

    name  ping     score take
0  sasha    72  0.923263  one
1  sasha    14  0.724720  two
2    asa    76  0.774320  one
3    asa    71  0.128721  two

What I want to do is to group by the columns "name" and "take" (in that order), so that I can get a DataFrame indexed by the multiindex constructed from the columns "name" and "take", like below.

               score  ping
 name take        
sasha  one  0.923263    72
       two  0.724720    14
  asa  one  0.774320    76
       two  0.128721    71

How do I achieve that? If I do grouped = data.groupby(["name", "take"]), then grouped is a pandas.core.groupby.DataFrameGroupBy instance. What is the correct way of doing this?

like image 243
Ray Avatar asked Oct 25 '16 09:10

Ray


1 Answers

You need set_index:

data = data.set_index(['name','take'])
print (data)
            ping     score
name  take                
sasha one     46  0.509177
      two     77  0.828984
asa   one     51  0.637451
      two     51  0.658616
like image 183
jezrael Avatar answered Sep 19 '22 13:09

jezrael