Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python Pandas: transforming - moving values from diagonal

given the following DataFrame, grouped with:

    dataset = z.groupby(
        ['app', 'regmonth', 'loginsmonth']).sum().unstack().fillna(
            0, inplace=False)


                             cnt                                      
loginsmonth           2014-02-01  2014-03-01  2014-04-01  2014-05-01   
app       regmonth                                                     
1         2014-02-01        6069        1837         107          54   
          2014-03-01           0       10742        2709        1394   
          2014-04-01           0           0        5584        1107   
          2014-05-01           0           0           0        3044   
          2014-06-01           0           0           0           0   

I would like to transform it to:

                             cnt                                      
loginsmonth           2014-02-01  2014-03-01  2014-04-01  2014-05-01   
app       regmonth                                                     
1         2014-02-01        6069        1837         107          54   
          2014-03-01       10742        2709        1394           0   
          2014-04-01        5584        1107           0           0   
          2014-05-01        3044           0           0           0
          2014-06-01           0           0           0           0   

So, it moves the diagonal to the beginning of the line and fills the void with zeros. Does panda have any easy way of doing it?

like image 866
jbastos Avatar asked Jun 04 '14 18:06

jbastos


1 Answers

But you are changing your data in the process, right?

I don't know if pandas has a nice way to do it, but np.diagnoal can do what you want here:

In [96]:

print df
loginsmonth     2014-02-01  2014-03-01  2014-04-01  2014-05-01
app regmonth                                                  
1   2014-02-01        6069        1837         107          54
    2014-03-01           0       10742        2709        1394
    2014-04-01           0           0        5584        1107
    2014-05-01           0           0           0        3044
    2014-06-01           0           0           0           0

[5 rows x 4 columns]
In [124]:

print df*0+np.asarray([np.hstack((np.diagonal(df.values, i), np.zeros(i+1, int))) 
                       for i in range(df.shape[1])]).T
loginsmonth     2014-02-01  2014-03-01  2014-04-01  2014-05-01
app regmonth                                                  
1   2014-02-01        6069        1837         107          54
    2014-03-01       10742        2709        1394           0
    2014-04-01        5584        1107           0           0
    2014-05-01        3044           0           0           0
    2014-06-01           0           0           0           0

[5 rows x 4 columns]

here the 1 in np.zeros(i+1, int) is df.shape[0]-df.shape[1]. I don't know whether you will have situation where df.shape[0]<df.shape[1].

But if your DataFrame always looks like the one shown here, which is like a upper diagonal matrix without 0s inside, you can just take a short cut:

In [134]:

print df.apply(lambda x: sorted(x, key=lambda y: y==0), axis=1)
cnt             2014-02-01  2014-03-01  2014-04-01  2014-05-01
app regmonth                                                  
1   2014-02-01        6069        1837         107          54
    2014-03-01       10742        2709        1394           0
    2014-04-01        5584        1107           0           0
    2014-05-01        3044           0           0           0
    2014-06-01           0           0           0           0

[5 rows x 4 columns] 
like image 110
CT Zhu Avatar answered Oct 20 '22 20:10

CT Zhu