Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to efficiently change data layout of a DataFrame in pandas?

Tags:

python

pandas

I've got a DataFrame with values arranged in two columns, see table T1. Would like to rearrange the values in a way to create data layout as shown in table T2. Rows in T2 are created by transposing a "sliding window" of values, moving down the column a in table T1. Is there some clever way in panda's to do this efficiently?

T1              T2

 a | b           A |  B |  C | D
------          ---------------
41 | 5          41 | 42 | 43 | 7
42 | 6          42 | 43 | 44 | 8
43 | 7    -->   43 | 44 | 45 | 9
44 | 8          44 | 45 | .. | .
45 | 9          45 | .. | .. | .
.. | .          .. | .. | .. | .
.. | .          .. | .. | .. | .
like image 746
Joanna Kowal Avatar asked Nov 14 '19 17:11

Joanna Kowal


2 Answers

You could use as_strided:

from numpy.lib.stride_tricks import as_strided

window = 3
stride = df['a'].values.strides[0]

pd.DataFrame(as_strided(df['a'].values, 
                        shape=(len(df) - window + 1, window),
                        strides = (stride,stride))
            )

Output:

     0   1   2
0   41  42  43
1   42  43  44
2   43  44  45
like image 187
Quang Hoang Avatar answered Oct 27 '22 00:10

Quang Hoang


This should do the trick:

df = df.rename(columns={"b": "D", "a": "A"})

df["B"] = df["A"].shift(-1)
df["C"] = df["A"].shift(-2)
df["D"] = df["D"].shift(-2)
df = df.sort_index(axis=1)

Output:

    A     B     C    D
0  41  42.0  43.0  7.0
1  42  43.0  44.0  8.0
2  43  44.0  45.0  9.0
3  44  45.0   NaN  NaN
4  45   NaN   NaN  NaN
like image 38
Grzegorz Skibinski Avatar answered Oct 27 '22 00:10

Grzegorz Skibinski