Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Convert MultiIndex DataFrame to Series

I created a multiIndex DataFrame by:

df.set_index(['Field1', 'Field2'], inplace=True)

If this is not a multiIndex DataFrame please tell me how to make one.

I want to:

  • Group by the same columns that are in the index
  • Aggregate a count of each group
  • Then return the whole thing as a Series with Field1 and Field2 as the index

How do I go about doing this?

ADDITIONAL INFO

I have a multiIndex dataFrame that looks like this:

Continent     Sector                Count     
Asia          1                     4
              2                     1
Australia     1                     1
Europe        1                     1
              2                     3
              3                     2
North America 1                     1
              5                     1
South America 5                     1

How can I return this as a Series with the index of [Continent, Sector]

like image 203
Alex Avatar asked Dec 13 '16 06:12

Alex


People also ask

How do you convert a data frame to a series?

To convert the last or specific column of the Pandas dataframe to series, use the integer-location-based index in the df. iloc[:,0] . For example, we want to convert the third or last column of the given data from Pandas dataframe to series. In this case, the following code example will help us.

How do I convert MultiIndex to single index in Pandas?

To revert the index of the dataframe from multi-index to a single index using the Pandas inbuilt function reset_index(). Returns: (Data Frame or None) DataFrame with the new index or None if inplace=True.


1 Answers

I think you need groupby with aggregate size:

df = pd.DataFrame({'Field1':[1,1,1],
                   'Field2':[4,4,6],
                   'C':[7,8,9],
                   'D':[1,3,5],
                   'E':[5,3,6],
                   'F':[7,4,3]})


df.set_index(['Field1', 'Field2'], inplace=True)
print (df)
               C  D  E  F
Field1 Field2            
1      4       7  1  5  7
       4       8  3  3  4
       6       9  5  6  3

print (df.index)
MultiIndex(levels=[[1], [4, 6]],
           labels=[[0, 0, 0], [0, 0, 1]],
           names=['Field1', 'Field2'])

print (df.groupby(level=[0,1]).size())
Field1  Field2
1       4         2
        6         1
dtype: int64

print (df.groupby(level=['Field1', 'Field2']).size())
Field1  Field2
1       4         2
        6         1
dtype: int64

print (df.groupby(level=['Field1', 'Field2']).count())
               C  D  E  F
Field1 Field2            
1      4       2  2  2  2
       6       1  1  1  1

What is the difference between size and count in pandas?

EDIT by comment:

df.set_index(['Continent', 'Sector'], inplace=True)
print (df)
                      Count
Continent     Sector       
Asia          1           4
              2           1
Australia     1           1
Europe        1           1
              2           3
              3           2
North America 1           1
              5           1
South America 5           1

print (df['Count'])
Continent      Sector
Asia           1         4
               2         1
Australia      1         1
Europe         1         1
               2         3
               3         2
North America  1         1
               5         1
South America  5         1
Name: Count, dtype: int64

Or:

print (df.squeeze())
Continent      Sector
Asia           1         4
               2         1
Australia      1         1
Europe         1         1
               2         3
               3         2
North America  1         1
               5         1
South America  5         1
Name: Count, dtype: int64

All together with set_index:

print (df)
       Continent  Sector  Count
0           Asia       1      4
1           Asia       2      1
2      Australia       1      1
3         Europe       1      1
4         Europe       2      3
5         Europe       3      2
6  North America       1      1
7  North America       5      1
8  South America       5      1

print (df.set_index(['Continent', 'Sector'])['Count'])
Continent      Sector
Asia           1         4
               2         1
Australia      1         1
Europe         1         1
               2         3
               3         2
North America  1         1
               5         1
South America  5         1
Name: Count, dtype: int64 
like image 67
jezrael Avatar answered Sep 19 '22 17:09

jezrael