Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

pivot a dataframe by diagonals

Given a DataFrame

   col1  col2  col3
0     1     4     7
1     2     5     8
2     3     6     9

How get something like this:

         0    1    2
               
0      1.0  2.0  3.0
1      5.0  4.0  7.0
2      9.0  6.0  NaN
3      NaN  8.0  NaN

if we consider the dataframe as an array of indices i, j then in diag n would be those where abs (i-j) = n

a plus would be to be able to choose the order:

intercale = True ,first_diag = 'left'

         0    1    2
               
0      1.0  2.0  3.0
1      5.0  4.0  7.0
2      9.0  6.0  NaN
3      NaN  8.0  NaN

intercalate = False, first_diag ='left'

         0    1    2
               
0      1.0  2.0  3.0
1      5.0  6.0  7.0
2      9.0  4.0  NaN
3      NaN  8.0  NaN

intercalate = True, first_diag ='right'

         0    1    2
               
0      1.0  4.0  7.0
1      5.0  2.0  3.0
2      9.0  8.0  NaN
3      NaN  6.0  NaN

intercalate = False, first_diag ='right'

         0    1    2
               
0      1.0  4.0  7.0
1      5.0  8.0  3.0
2      9.0  2.0  NaN
3      NaN  6.0  NaN

there could even be another degree of freedom to sort by choosing the direction from the lower corner to the upper corner or the other way around. Or select the other main diagonal

My approach with pandas

df2 = df.reset_index().melt('index').assign(variable = lambda x: x.variable.factorize()[0])
df2['diag'] = df2['index'].sub(df2['variable']).abs()
new_df = (df2.assign(index = df2.groupby('diag').cumcount())
             .pivot_table(index = 'index',columns = 'diag',values = 'value'))
print(new_df)
diag     0    1    2
index               
0      1.0  2.0  3.0
1      5.0  4.0  7.0
2      9.0  6.0  NaN
3      NaN  8.0  NaN

I was wondering if there could be any easier way to do this, maybe with numpy

like image 280
ansev Avatar asked Jan 23 '20 18:01

ansev


People also ask

How can I pivot a DataFrame?

DataFrame - pivot() functionThe pivot() function is used to reshaped a given DataFrame organized by given index / column values. This function does not support data aggregation, multiple values will result in a MultiIndex in the columns. Column to use to make new frame's index. If None, uses existing index.

How do I pivot multiple columns in Pandas?

Using the Pandas pivot_table() function we can reshape the DataFrame on multiple columns in the form of an Excel pivot table. To group the data in a pivot table we will need to pass a DataFrame into this function and the multiple columns you wanted to group as an index.

What is difference between pivot and pivot table in Pandas?

Basically, the pivot_table() function is a generalization of the pivot() function that allows aggregation of values — for example, through the len() function in the previous example. Pivot only works — or makes sense — if you need to pivot a table and show values without any aggregation.

How do you reshape a data frame?

melt() function is used to reshape a DataFrame from a wide to a long format. It is useful to get a DataFrame where one or more columns are identifier variables, and the other columns are unpivoted to the row axis leaving only two non-identifier columns named variable and value by default.


1 Answers

Case #1 : Order of elements in each column of output isn't important

Approach #1 : Here's one way with NumPy -

def diagonalize(a): # input is array and output is df
    n = len(a)    
    r = np.arange(n)

    idx = np.abs(r[:,None]-r)
    lens = np.r_[n,np.arange(2*n-2,0,-2)]
    split_idx = lens.cumsum()

    b = a.flat[idx.ravel().argsort()]
    v = np.split(b,split_idx[:-1])
    return pd.DataFrame(v).T

Sample run -

In [110]: df
Out[110]: 
   col1  col2  col3  col4
0     1     2     3     4
1     5     6     7     8
2     9    10    11    12
3    13    14    15    16

In [111]: diagonalize(df.to_numpy(copy=False))
Out[111]: 
      0     1     2     3
0   1.0   2.0   3.0   4.0
1   6.0   5.0   8.0  13.0
2  11.0   7.0   9.0   NaN
3  16.0  10.0  14.0   NaN
4   NaN  12.0   NaN   NaN
5   NaN  15.0   NaN   NaN

Approach #2 : Similar to earlier, but completely NumPy based and no-loops -

def diagonalize_v2(a): # input, outputs are arrays
    # Setup params
    n = len(a)    
    r = np.arange(n)

    # Get indices based on "diagonalization" (distance off diagonal)
    idx = np.abs(r[:,None]-r)
    lens = np.r_[n,np.arange(2*n-2,0,-2)]

    # Values in the order of "diagonalization"
    b = a.flat[idx.ravel().argsort()]

    # Get a mask for the final o/p where elements are to be assigned
    mask = np.arange(lens.max())[:,None]<lens

    # Setup o/p and assign
    out = np.full(mask.shape,np.nan)
    out.T[mask.T] = b
    return out

Sample run -

In [2]: a
Out[2]: 
array([[ 1,  2,  3,  4],
       [ 5,  6,  7,  8],
       [ 9, 10, 11, 12],
       [13, 14, 15, 16]])

In [3]: diagonalize_v2(a)
Out[3]: 
array([[ 1.,  2.,  3.,  4.],
       [ 6.,  5.,  8., 13.],
       [11.,  7.,  9., nan],
       [16., 10., 14., nan],
       [nan, 12., nan, nan],
       [nan, 15., nan, nan]])

Case #2 : Order of elements in each column matters

We have two additional input args to manage the order. The solution would a modified version inspired mostly by Approach #1 -

def diagonalize_generic(a, intercale = True ,first_diag = 'left'):
    # Setup params
    n = len(a)    
    r = np.arange(n)

    # Get indices based on "diagonalization" (distance off diagonal)
    idx = np.abs(r[:,None]-r)
    lens = np.r_[n,np.arange(2*n-2,0,-2)]

    if first_diag=='left':
        w = np.triu(np.ones(n, dtype=int))
    elif first_diag=='right':
        w = np.tril(np.ones(n, dtype=int))
    else:
        raise Exception('Wrong first_diag value!')

    order = np.lexsort(np.c_[w.ravel(),idx.ravel()].T)

    split_idx = lens.cumsum()
    o_split = np.split(order,split_idx[:-1])

    f = a.flat

    if intercale==1:
        v = [f[o_split[0]]] + [f[o.reshape(2,-1).ravel('F')] for o in o_split[1:]]
    else:
        v = [f[o] for o in o_split]
    return pd.DataFrame(v).T

Sample run

Input as array :

In [53]: a
Out[53]: 
array([[1, 4, 7],
       [2, 5, 8],
       [3, 6, 9]])

Different scenarios :

In [54]: diagonalize_generic(a, intercale = True, first_diag = 'left')
Out[54]: 
     0    1    2
0  1.0  2.0  3.0
1  5.0  4.0  7.0
2  9.0  6.0  NaN
3  NaN  8.0  NaN

In [55]: diagonalize_generic(a, intercale = False, first_diag = 'left')
Out[55]: 
     0    1    2
0  1.0  2.0  3.0
1  5.0  6.0  7.0
2  9.0  4.0  NaN
3  NaN  8.0  NaN

In [56]: diagonalize_generic(a, intercale = True, first_diag = 'right')
Out[56]: 
     0    1    2
0  1.0  4.0  7.0
1  5.0  2.0  3.0
2  9.0  8.0  NaN
3  NaN  6.0  NaN

In [57]: diagonalize_generic(a, intercale = False, first_diag = 'right')
Out[57]: 
     0    1    2
0  1.0  4.0  7.0
1  5.0  8.0  3.0
2  9.0  2.0  NaN
3  NaN  6.0  NaN
like image 197
Divakar Avatar answered Oct 09 '22 15:10

Divakar