Given a DataFrame
col1 col2 col3
0 1 4 7
1 2 5 8
2 3 6 9
How get something like this:
0 1 2
0 1.0 2.0 3.0
1 5.0 4.0 7.0
2 9.0 6.0 NaN
3 NaN 8.0 NaN
if we consider the dataframe as an array of indices i, j
then in diag n would be those where abs (i-j) = n
a plus would be to be able to choose the order:
intercale = True ,first_diag = 'left'
0 1 2
0 1.0 2.0 3.0
1 5.0 4.0 7.0
2 9.0 6.0 NaN
3 NaN 8.0 NaN
intercalate = False, first_diag ='left'
0 1 2
0 1.0 2.0 3.0
1 5.0 6.0 7.0
2 9.0 4.0 NaN
3 NaN 8.0 NaN
intercalate = True, first_diag ='right'
0 1 2
0 1.0 4.0 7.0
1 5.0 2.0 3.0
2 9.0 8.0 NaN
3 NaN 6.0 NaN
intercalate = False, first_diag ='right'
0 1 2
0 1.0 4.0 7.0
1 5.0 8.0 3.0
2 9.0 2.0 NaN
3 NaN 6.0 NaN
there could even be another degree of freedom to sort by choosing the direction from the lower corner to the upper corner or the other way around. Or select the other main diagonal
My approach with pandas
df2 = df.reset_index().melt('index').assign(variable = lambda x: x.variable.factorize()[0])
df2['diag'] = df2['index'].sub(df2['variable']).abs()
new_df = (df2.assign(index = df2.groupby('diag').cumcount())
.pivot_table(index = 'index',columns = 'diag',values = 'value'))
print(new_df)
diag 0 1 2
index
0 1.0 2.0 3.0
1 5.0 4.0 7.0
2 9.0 6.0 NaN
3 NaN 8.0 NaN
I was wondering if there could be any easier way to do this, maybe with numpy
DataFrame - pivot() functionThe pivot() function is used to reshaped a given DataFrame organized by given index / column values. This function does not support data aggregation, multiple values will result in a MultiIndex in the columns. Column to use to make new frame's index. If None, uses existing index.
Using the Pandas pivot_table() function we can reshape the DataFrame on multiple columns in the form of an Excel pivot table. To group the data in a pivot table we will need to pass a DataFrame into this function and the multiple columns you wanted to group as an index.
Basically, the pivot_table() function is a generalization of the pivot() function that allows aggregation of values — for example, through the len() function in the previous example. Pivot only works — or makes sense — if you need to pivot a table and show values without any aggregation.
melt() function is used to reshape a DataFrame from a wide to a long format. It is useful to get a DataFrame where one or more columns are identifier variables, and the other columns are unpivoted to the row axis leaving only two non-identifier columns named variable and value by default.
Approach #1 : Here's one way with NumPy -
def diagonalize(a): # input is array and output is df
n = len(a)
r = np.arange(n)
idx = np.abs(r[:,None]-r)
lens = np.r_[n,np.arange(2*n-2,0,-2)]
split_idx = lens.cumsum()
b = a.flat[idx.ravel().argsort()]
v = np.split(b,split_idx[:-1])
return pd.DataFrame(v).T
Sample run -
In [110]: df
Out[110]:
col1 col2 col3 col4
0 1 2 3 4
1 5 6 7 8
2 9 10 11 12
3 13 14 15 16
In [111]: diagonalize(df.to_numpy(copy=False))
Out[111]:
0 1 2 3
0 1.0 2.0 3.0 4.0
1 6.0 5.0 8.0 13.0
2 11.0 7.0 9.0 NaN
3 16.0 10.0 14.0 NaN
4 NaN 12.0 NaN NaN
5 NaN 15.0 NaN NaN
Approach #2 : Similar to earlier, but completely NumPy based and no-loops -
def diagonalize_v2(a): # input, outputs are arrays
# Setup params
n = len(a)
r = np.arange(n)
# Get indices based on "diagonalization" (distance off diagonal)
idx = np.abs(r[:,None]-r)
lens = np.r_[n,np.arange(2*n-2,0,-2)]
# Values in the order of "diagonalization"
b = a.flat[idx.ravel().argsort()]
# Get a mask for the final o/p where elements are to be assigned
mask = np.arange(lens.max())[:,None]<lens
# Setup o/p and assign
out = np.full(mask.shape,np.nan)
out.T[mask.T] = b
return out
Sample run -
In [2]: a
Out[2]:
array([[ 1, 2, 3, 4],
[ 5, 6, 7, 8],
[ 9, 10, 11, 12],
[13, 14, 15, 16]])
In [3]: diagonalize_v2(a)
Out[3]:
array([[ 1., 2., 3., 4.],
[ 6., 5., 8., 13.],
[11., 7., 9., nan],
[16., 10., 14., nan],
[nan, 12., nan, nan],
[nan, 15., nan, nan]])
We have two additional input args to manage the order. The solution would a modified version inspired mostly by Approach #1
-
def diagonalize_generic(a, intercale = True ,first_diag = 'left'):
# Setup params
n = len(a)
r = np.arange(n)
# Get indices based on "diagonalization" (distance off diagonal)
idx = np.abs(r[:,None]-r)
lens = np.r_[n,np.arange(2*n-2,0,-2)]
if first_diag=='left':
w = np.triu(np.ones(n, dtype=int))
elif first_diag=='right':
w = np.tril(np.ones(n, dtype=int))
else:
raise Exception('Wrong first_diag value!')
order = np.lexsort(np.c_[w.ravel(),idx.ravel()].T)
split_idx = lens.cumsum()
o_split = np.split(order,split_idx[:-1])
f = a.flat
if intercale==1:
v = [f[o_split[0]]] + [f[o.reshape(2,-1).ravel('F')] for o in o_split[1:]]
else:
v = [f[o] for o in o_split]
return pd.DataFrame(v).T
Sample run
Input as array :
In [53]: a
Out[53]:
array([[1, 4, 7],
[2, 5, 8],
[3, 6, 9]])
Different scenarios :
In [54]: diagonalize_generic(a, intercale = True, first_diag = 'left')
Out[54]:
0 1 2
0 1.0 2.0 3.0
1 5.0 4.0 7.0
2 9.0 6.0 NaN
3 NaN 8.0 NaN
In [55]: diagonalize_generic(a, intercale = False, first_diag = 'left')
Out[55]:
0 1 2
0 1.0 2.0 3.0
1 5.0 6.0 7.0
2 9.0 4.0 NaN
3 NaN 8.0 NaN
In [56]: diagonalize_generic(a, intercale = True, first_diag = 'right')
Out[56]:
0 1 2
0 1.0 4.0 7.0
1 5.0 2.0 3.0
2 9.0 8.0 NaN
3 NaN 6.0 NaN
In [57]: diagonalize_generic(a, intercale = False, first_diag = 'right')
Out[57]:
0 1 2
0 1.0 4.0 7.0
1 5.0 8.0 3.0
2 9.0 2.0 NaN
3 NaN 6.0 NaN
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With