Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Transforming Dataframe columns into Dataframe of rows

I have following DataFrame:

data = {'year': [2010, 2010, 2011, 2012, 2011, 2012, 2010, 2011, 2012, 2013],
        'store_number': ['1944', '1945', '1946', '1947', '1948', '1949', '1947', '1948', '1949', '1947'],
        'retailer_name': ['Walmart','Walmart', 'CRV', 'CRV', 'CRV', 'Walmart', 'Walmart', 'CRV', 'CRV', 'CRV'],
        'product': ['a', 'b', 'a', 'a', 'b', 'a', 'b', 'a', 'a', 'c'],
        'amount': [5, 5, 8, 6, 1, 5, 10, 6, 12, 11]}

stores = pd.DataFrame(data, columns=['retailer_name', 'store_number', 'year', 'product', 'amount'])
stores.set_index(['retailer_name', 'store_number', 'year', 'product'], inplace=True)

stores.groupby(level=[0, 1, 2, 3]).sum()

I want to transform following Dataframe:

                                         amount
retailer_name store_number year product        
CRV           1946         2011 a             8
              1947         2012 a             6
                           2013 c            11
              1948         2011 a             6
                                b             1
              1949         2012 a            12
Walmart       1944         2010 a             5
              1945         2010 b             5
              1947         2010 b            10
              1949         2012 a             5

into dataframe of rows:

retailer_name store_number year a b c
CRV           1946         2011 8 0 0
CRV           1947         2012 6 0 0
etc...

The products are known ahead. Any idea how to do so ?

like image 472
Night Walker Avatar asked May 19 '16 20:05

Night Walker


2 Answers

Please see below for solution. Thanks to EdChum for corrections to original post.

Without reset_index()

stores.groupby(level=[0, 1, 2, 3]).sum().unstack().fillna(0)




                 amount        
product                              a   b   c
retailer_name store_number year               
CRV           1946         2011      8   0   0
              1947         2012      6   0   0
                           2013      0   0  11
              1948         2011      6   1   0
              1949         2012     12   0   0
Walmart       1944         2010      5   0   0
              1945         2010      0   5   0
              1947         2010      0  10   0
              1949         2012      5   0   0

With reset_index()

stores.groupby(level=[0, 1, 2, 3]).sum().unstack().reset_index().fillna(0)



  retailer_name store_number  year amount        
product                                       a   b   c
0                 CRV         1946  2011      8   0   0
1                 CRV         1947  2012      6   0   0
2                 CRV         1947  2013      0   0  11
3                 CRV         1948  2011      6   1   0
4                 CRV         1949  2012     12   0   0
5             Walmart         1944  2010      5   0   0
6             Walmart         1945  2010      0   5   0
7             Walmart         1947  2010      0  10   0
8             Walmart         1949  2012      5   0   0
like image 97
PyNoob Avatar answered Sep 30 '22 11:09

PyNoob


Unstack product from the index and fill NaN values with zero.

df = stores.groupby(level=[0, 1, 2, 3]).sum().unstack('product')
mask = pd.IndexSlice['amount', :]
df.loc[:, mask] = df.loc[:, mask].fillna(0)

>>> df
                                amount        
product                              a   b   c
retailer_name store_number year               
CRV           1946         2011      8   0   0
              1947         2012      6   0   0
                           2013      0   0  11
              1948         2011      6   1   0
              1949         2012     12   0   0
Walmart       1944         2010      5   0   0
              1945         2010      0   5   0
              1947         2010      0  10   0
              1949         2012      5   0   0
like image 41
Alexander Avatar answered Sep 30 '22 11:09

Alexander