I have this input worksheet:
raw = pd.read_excel("raw.xlsx", header=None)
>Out[4]:
0 1 2 3 4
0 48 59.0 28.0 6.0 36.0
1 41 36.0 52.0 3.0 22.0
2 32 86.0 66.0 68.0 9.0
3 71 23.0 6.0 98.0 19.0
4 18 92.0 66.0 6.0 54.0
5 Andy NaN NaN NaN NaN
6 56 89.0 6.0 32.0 50.0
7 3 68.0 49.0 93.0 15.0
8 27 65.0 94.0 96.0 66.0
9 40 96.0 71.0 22.0 83.0
10 96 23.0 5.0 49.0 14.0
11 Bob NaN NaN NaN NaN
12 43 34.0 42.0 11.0 73.0
13 42 41.0 17.0 91.0 35.0
14 81 74.0 24.0 95.0 95.0
15 89 57.0 35.0 66.0 56.0
16 54 76.0 55.0 72.0 63.0
17 David NaN NaN NaN NaN
18 58 8.0 62.0 63.0 8.0
19 15 93.0 97.0 38.0 5.0
20 13 96.0 42.0 51.0 48.0
21 23 88.0 20.0 91.0 39.0
22 9 67.0 45.0 58.0 92.0
23 Bill NaN NaN NaN NaN
24 2 3.0 80.0 28.0 38.0
25 100 68.0 83.0 26.0 45.0
26 79 57.0 40.0 76.0 83.0
27 12 98.0 76.0 63.0 53.0
28 60 88.0 70.0 13.0 50.0
29 Luke NaN NaN NaN NaN
I have to rearrange the "block' of data on a single line, followed by the name. The format is fixed, and the output should look like this
>Out[6]:
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
0 48 59 28 6 36 41 36 52 3 22 32 86 66 68 9 71 23 6 98 19 18 92 66 6 54 Andy
1 56 89 6 32 50 3 68 49 93 15 27 65 94 96 66 40 96 71 22 83 96 23 5 49 14 Bob
2 43 34 42 11 73 42 41 17 91 35 81 74 24 95 95 89 57 35 66 56 54 76 55 72 63 David
3 58 8 62 63 8 15 93 97 38 5 13 96 42 51 48 23 88 20 91 39 9 67 45 58 92 Bill
4 2 3 80 28 38 100 68 83 26 45 79 57 40 76 83 12 98 76 63 53 60 88 70 13 50 Luke
Which is the most pythonic way to do this?
Thanks
First part of code is many lines that read-in data from string lines, this is just for example, you need to do your pd.read_excel() instead.
The real part of code is conversion which is in few next lines:
a = df.values
a = a.reshape([a.size // 30, 30])
a = a[:, :-4]
df = pd.DataFrame(a)
This above can be even shortened to one-liner:
df = pd.DataFrame(df.values.reshape((-1, 30))[:, :-4])
Full code down below:
Try it online!
import pandas as pd, numpy as np
# Next is just reading-in my data in a fancy way,
# you do pd.read_excel(file) instead like you did before
df = pd.DataFrame([line.split() for line in """
48 59.0 28.0 6.0 36.0
41 36.0 52.0 3.0 22.0
32 86.0 66.0 68.0 9.0
71 23.0 6.0 98.0 19.0
18 92.0 66.0 6.0 54.0
Andy NaN NaN NaN NaN
56 89.0 6.0 32.0 50.0
3 68.0 49.0 93.0 15.0
27 65.0 94.0 96.0 66.0
40 96.0 71.0 22.0 83.0
96 23.0 5.0 49.0 14.0
Bob NaN NaN NaN NaN
43 34.0 42.0 11.0 73.0
42 41.0 17.0 91.0 35.0
81 74.0 24.0 95.0 95.0
89 57.0 35.0 66.0 56.0
54 76.0 55.0 72.0 63.0
David NaN NaN NaN NaN
58 8.0 62.0 63.0 8.0
15 93.0 97.0 38.0 5.0
13 96.0 42.0 51.0 48.0
23 88.0 20.0 91.0 39.0
9 67.0 45.0 58.0 92.0
Bill NaN NaN NaN NaN
2 3.0 80.0 28.0 38.0
100 68.0 83.0 26.0 45.0
79 57.0 40.0 76.0 83.0
12 98.0 76.0 63.0 53.0
60 88.0 70.0 13.0 50.0
Luke NaN NaN NaN NaN
""".splitlines() if line.strip()])
a = df.values
a = a.reshape([a.size // 30, 30])
a = a[:, :-4]
df = pd.DataFrame(a)
print(df)
Outputs:
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
0 48 59.0 28.0 6.0 36.0 41 36.0 52.0 3.0 22.0 32 86.0 66.0 68.0 9.0 71 23.0 6.0 98.0 19.0 18 92.0 66.0 6.0 54.0 Andy
1 56 89.0 6.0 32.0 50.0 3 68.0 49.0 93.0 15.0 27 65.0 94.0 96.0 66.0 40 96.0 71.0 22.0 83.0 96 23.0 5.0 49.0 14.0 Bob
2 43 34.0 42.0 11.0 73.0 42 41.0 17.0 91.0 35.0 81 74.0 24.0 95.0 95.0 89 57.0 35.0 66.0 56.0 54 76.0 55.0 72.0 63.0 David
3 58 8.0 62.0 63.0 8.0 15 93.0 97.0 38.0 5.0 13 96.0 42.0 51.0 48.0 23 88.0 20.0 91.0 39.0 9 67.0 45.0 58.0 92.0 Bill
4 2 3.0 80.0 28.0 38.0 100 68.0 83.0 26.0 45.0 79 57.0 40.0 76.0 83.0 12 98.0 76.0 63.0 53.0 60 88.0 70.0 13.0 50.0 Luke
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With