Given a DataFrame <pre class="prettyprint"><code> col1 col2 col3 0 1 4 7 1 2 5 8 2 3 6 9 </code></pre> How get something like this: <pre class="prettyprint"><code> 0 1 2 0 1.0 2.0 3.0 1 5.0 4.0 7.0 2 9.0 6.0 NaN 3 NaN 8.0 NaN </code></pre> <hr> if we consider the dataframe as an array of indices <code>i, j</code> then in diag n would be those where <code>abs (i-j) = n</code> a plus would be to be able to choose the order: intercale = True ,first_diag = 'left' <pre class="prettyprint"><code> 0 1 2 0 1.0 2.0 3.0 1 5.0 4.0 7.0 2 9.0 6.0 NaN 3 NaN 8.0 NaN </code></pre> intercalate = False, first_diag ='left' <pre class="prettyprint"><code> 0 1 2 0 1.0 2.0 3.0 1 5.0 6.0 7.0 2 9.0 4.0 NaN 3 NaN 8.0 NaN </code></pre> intercalate = True, first_diag ='right' <pre class="prettyprint"><code> 0 1 2 0 1.0 4.0 7.0 1 5.0 2.0 3.0 2 9.0 8.0 NaN 3 NaN 6.0 NaN </code></pre> intercalate = False, first_diag ='right' <pre class="prettyprint"><code> 0 1 2 0 1.0 4.0 7.0 1 5.0 8.0 3.0 2 9.0 2.0 NaN 3 NaN 6.0 NaN </code></pre> there could even be another degree of freedom to sort by choosing the direction from the lower corner to the upper corner or the other way around. Or select the other main diagonal My approach with pandas <pre class="prettyprint"><code>df2 = df.reset_index().melt('index').assign(variable = lambda x: x.variable.factorize()[0]) df2['diag'] = df2['index'].sub(df2['variable']).abs() new_df = (df2.assign(index = df2.groupby('diag').cumcount()) .pivot_table(index = 'index',columns = 'diag',values = 'value')) print(new_df) diag 0 1 2 index 0 1.0 2.0 3.0 1 5.0 4.0 7.0 2 9.0 6.0 NaN 3 NaN 8.0 NaN </code></pre> I was wondering if there could be any easier way to do this, maybe with numpy

<h3>Case #1 : Order of elements in each column of output isn't important</h3> Approach #1 : Here's one way with NumPy - <pre class="prettyprint"><code>def diagonalize(a): # input is array and output is df n = len(a) r = np.arange(n) idx = np.abs(r[:,None]-r) lens = np.r_[n,np.arange(2*n-2,0,-2)] split_idx = lens.cumsum() b = a.flat[idx.ravel().argsort()] v = np.split(b,split_idx[:-1]) return pd.DataFrame(v).T </code></pre> Sample run - <pre class="prettyprint"><code>In [110]: df Out[110]: col1 col2 col3 col4 0 1 2 3 4 1 5 6 7 8 2 9 10 11 12 3 13 14 15 16 In [111]: diagonalize(df.to_numpy(copy=False)) Out[111]: 0 1 2 3 0 1.0 2.0 3.0 4.0 1 6.0 5.0 8.0 13.0 2 11.0 7.0 9.0 NaN 3 16.0 10.0 14.0 NaN 4 NaN 12.0 NaN NaN 5 NaN 15.0 NaN NaN </code></pre> Approach #2 : Similar to earlier, but completely NumPy based and no-loops - <pre class="prettyprint"><code>def diagonalize_v2(a): # input, outputs are arrays # Setup params n = len(a) r = np.arange(n) # Get indices based on "diagonalization" (distance off diagonal) idx = np.abs(r[:,None]-r) lens = np.r_[n,np.arange(2*n-2,0,-2)] # Values in the order of "diagonalization" b = a.flat[idx.ravel().argsort()] # Get a mask for the final o/p where elements are to be assigned mask = np.arange(lens.max())[:,None]<lens # Setup o/p and assign out = np.full(mask.shape,np.nan) out.T[mask.T] = b return out </code></pre> Sample run - <pre class="prettyprint"><code>In [2]: a Out[2]: array([[ 1, 2, 3, 4], [ 5, 6, 7, 8], [ 9, 10, 11, 12], [13, 14, 15, 16]]) In [3]: diagonalize_v2(a) Out[3]: array([[ 1., 2., 3., 4.], [ 6., 5., 8., 13.], [11., 7., 9., nan], [16., 10., 14., nan], [nan, 12., nan, nan], [nan, 15., nan, nan]]) </code></pre> <hr> <h3>Case #2 : Order of elements in each column matters</h3> We have two additional input args to manage the order. The solution would a modified version inspired mostly by <code>Approach #1</code> - <pre class="prettyprint"><code>def diagonalize_generic(a, intercale = True ,first_diag = 'left'): # Setup params n = len(a) r = np.arange(n) # Get indices based on "diagonalization" (distance off diagonal) idx = np.abs(r[:,None]-r) lens = np.r_[n,np.arange(2*n-2,0,-2)] if first_diag=='left': w = np.triu(np.ones(n, dtype=int)) elif first_diag=='right': w = np.tril(np.ones(n, dtype=int)) else: raise Exception('Wrong first_diag value!') order = np.lexsort(np.c_[w.ravel(),idx.ravel()].T) split_idx = lens.cumsum() o_split = np.split(order,split_idx[:-1]) f = a.flat if intercale==1: v = [f[o_split[0]]] + [f[o.reshape(2,-1).ravel('F')] for o in o_split[1:]] else: v = [f[o] for o in o_split] return pd.DataFrame(v).T </code></pre> Sample run Input as array : <pre class="prettyprint"><code>In [53]: a Out[53]: array([[1, 4, 7], [2, 5, 8], [3, 6, 9]]) </code></pre> Different scenarios : <pre class="prettyprint"><code>In [54]: diagonalize_generic(a, intercale = True, first_diag = 'left') Out[54]: 0 1 2 0 1.0 2.0 3.0 1 5.0 4.0 7.0 2 9.0 6.0 NaN 3 NaN 8.0 NaN In [55]: diagonalize_generic(a, intercale = False, first_diag = 'left') Out[55]: 0 1 2 0 1.0 2.0 3.0 1 5.0 6.0 7.0 2 9.0 4.0 NaN 3 NaN 8.0 NaN In [56]: diagonalize_generic(a, intercale = True, first_diag = 'right') Out[56]: 0 1 2 0 1.0 4.0 7.0 1 5.0 2.0 3.0 2 9.0 8.0 NaN 3 NaN 6.0 NaN In [57]: diagonalize_generic(a, intercale = False, first_diag = 'right') Out[57]: 0 1 2 0 1.0 4.0 7.0 1 5.0 8.0 3.0 2 9.0 2.0 NaN 3 NaN 6.0 NaN </code></pre>

pivot a dataframe by diagonals

Tags:

python

pandas

dataframe

numpy

Given a DataFrame

   col1  col2  col3
0     1     4     7
1     2     5     8
2     3     6     9

How get something like this:

         0    1    2
               
0      1.0  2.0  3.0
1      5.0  4.0  7.0
2      9.0  6.0  NaN
3      NaN  8.0  NaN

if we consider the dataframe as an array of indices i, j then in diag n would be those where abs (i-j) = n

a plus would be to be able to choose the order:

intercale = True ,first_diag = 'left'

         0    1    2
               
0      1.0  2.0  3.0
1      5.0  4.0  7.0
2      9.0  6.0  NaN
3      NaN  8.0  NaN

intercalate = False, first_diag ='left'

         0    1    2
               
0      1.0  2.0  3.0
1      5.0  6.0  7.0
2      9.0  4.0  NaN
3      NaN  8.0  NaN

intercalate = True, first_diag ='right'

         0    1    2
               
0      1.0  4.0  7.0
1      5.0  2.0  3.0
2      9.0  8.0  NaN
3      NaN  6.0  NaN

intercalate = False, first_diag ='right'

         0    1    2
               
0      1.0  4.0  7.0
1      5.0  8.0  3.0
2      9.0  2.0  NaN
3      NaN  6.0  NaN

there could even be another degree of freedom to sort by choosing the direction from the lower corner to the upper corner or the other way around. Or select the other main diagonal

My approach with pandas

df2 = df.reset_index().melt('index').assign(variable = lambda x: x.variable.factorize()[0])
df2['diag'] = df2['index'].sub(df2['variable']).abs()
new_df = (df2.assign(index = df2.groupby('diag').cumcount())
             .pivot_table(index = 'index',columns = 'diag',values = 'value'))
print(new_df)
diag     0    1    2
index               
0      1.0  2.0  3.0
1      5.0  4.0  7.0
2      9.0  6.0  NaN
3      NaN  8.0  NaN

I was wondering if there could be any easier way to do this, maybe with numpy

280

asked Jan 23 '20 18:01

ansev

1 Answers

Case #1 : Order of elements in each column of output isn't important

Approach #1 : Here's one way with NumPy -

def diagonalize(a): # input is array and output is df
    n = len(a)    
    r = np.arange(n)

    idx = np.abs(r[:,None]-r)
    lens = np.r_[n,np.arange(2*n-2,0,-2)]
    split_idx = lens.cumsum()

    b = a.flat[idx.ravel().argsort()]
    v = np.split(b,split_idx[:-1])
    return pd.DataFrame(v).T

Sample run -

In [110]: df
Out[110]: 
   col1  col2  col3  col4
0     1     2     3     4
1     5     6     7     8
2     9    10    11    12
3    13    14    15    16

In [111]: diagonalize(df.to_numpy(copy=False))
Out[111]: 
      0     1     2     3
0   1.0   2.0   3.0   4.0
1   6.0   5.0   8.0  13.0
2  11.0   7.0   9.0   NaN
3  16.0  10.0  14.0   NaN
4   NaN  12.0   NaN   NaN
5   NaN  15.0   NaN   NaN

Approach #2 : Similar to earlier, but completely NumPy based and no-loops -

def diagonalize_v2(a): # input, outputs are arrays
    # Setup params
    n = len(a)    
    r = np.arange(n)

    # Get indices based on "diagonalization" (distance off diagonal)
    idx = np.abs(r[:,None]-r)
    lens = np.r_[n,np.arange(2*n-2,0,-2)]

    # Values in the order of "diagonalization"
    b = a.flat[idx.ravel().argsort()]

    # Get a mask for the final o/p where elements are to be assigned
    mask = np.arange(lens.max())[:,None]<lens

    # Setup o/p and assign
    out = np.full(mask.shape,np.nan)
    out.T[mask.T] = b
    return out

Sample run -

In [2]: a
Out[2]: 
array([[ 1,  2,  3,  4],
       [ 5,  6,  7,  8],
       [ 9, 10, 11, 12],
       [13, 14, 15, 16]])

In [3]: diagonalize_v2(a)
Out[3]: 
array([[ 1.,  2.,  3.,  4.],
       [ 6.,  5.,  8., 13.],
       [11.,  7.,  9., nan],
       [16., 10., 14., nan],
       [nan, 12., nan, nan],
       [nan, 15., nan, nan]])

Case #2 : Order of elements in each column matters

We have two additional input args to manage the order. The solution would a modified version inspired mostly by Approach #1 -

def diagonalize_generic(a, intercale = True ,first_diag = 'left'):
    # Setup params
    n = len(a)    
    r = np.arange(n)

    # Get indices based on "diagonalization" (distance off diagonal)
    idx = np.abs(r[:,None]-r)
    lens = np.r_[n,np.arange(2*n-2,0,-2)]

    if first_diag=='left':
        w = np.triu(np.ones(n, dtype=int))
    elif first_diag=='right':
        w = np.tril(np.ones(n, dtype=int))
    else:
        raise Exception('Wrong first_diag value!')

    order = np.lexsort(np.c_[w.ravel(),idx.ravel()].T)

    split_idx = lens.cumsum()
    o_split = np.split(order,split_idx[:-1])

    f = a.flat

    if intercale==1:
        v = [f[o_split[0]]] + [f[o.reshape(2,-1).ravel('F')] for o in o_split[1:]]
    else:
        v = [f[o] for o in o_split]
    return pd.DataFrame(v).T

Sample run

Input as array :

In [53]: a
Out[53]: 
array([[1, 4, 7],
       [2, 5, 8],
       [3, 6, 9]])

Different scenarios :

In [54]: diagonalize_generic(a, intercale = True, first_diag = 'left')
Out[54]: 
     0    1    2
0  1.0  2.0  3.0
1  5.0  4.0  7.0
2  9.0  6.0  NaN
3  NaN  8.0  NaN

In [55]: diagonalize_generic(a, intercale = False, first_diag = 'left')
Out[55]: 
     0    1    2
0  1.0  2.0  3.0
1  5.0  6.0  7.0
2  9.0  4.0  NaN
3  NaN  8.0  NaN

In [56]: diagonalize_generic(a, intercale = True, first_diag = 'right')
Out[56]: 
     0    1    2
0  1.0  4.0  7.0
1  5.0  2.0  3.0
2  9.0  8.0  NaN
3  NaN  6.0  NaN

In [57]: diagonalize_generic(a, intercale = False, first_diag = 'right')
Out[57]: 
     0    1    2
0  1.0  4.0  7.0
1  5.0  8.0  3.0
2  9.0  2.0  NaN
3  NaN  6.0  NaN

197

answered Oct 09 '22 15:10

Divakar

Related questions
                            
                                How to remove git repository, in python, on windows
                            
                                Neural network is not giving the expected output after training in Python
                            
                                Python read pickle protocol 4 error: STACK_GLOBAL requires str
                            
                                where Environment variables for python are saved
                            
                                How to enforce dataclass fields' types? [duplicate]
                            
                                Retrieving data from python's coroutine object
                            
                                mypy "Incompatible import" error for conditional imports
                            
                                Finding duplicates, and uniques of the duplicates in a csv
                            
                                How to get the mean of pandas cut categorical column
                            
                                Hotkey in vs code to switch between python interactive window and active editor?
                            
                                google colab /bin/bash: 'gdrive/My Drive/path/myfile : Permission denied
                            
                                Azure Functions (Python) blob ouput binding. How to set name when name is only part of the input message
                            
                                WARNING:tensorflow:Layer my_model is casting an input tensor from dtype float64 to the layer's dtype of float32, which is new behavior in TensorFlow 2
                            
                                random.SystemRandom().choice() vs random.choice()
                            
                                Resolve argparse alias back to the original command
                            
                                Sphinx not documenting complex Enum classes
                            
                                Using Python Multiprocessing Queue Inside AWS Lambda Function
                            
                                Pytorch: can we use nn.Module layers directly in forward() function?
                            
                                How to correctly handle cancelled tasks in Python's `asyncio.gather`
                            
                                Why use a double square bracket in Pandas?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With