Using Python 3.6, I have the results of a text reader that gives me repeating lines like this:
df
Col 1
0 Text A1
1 Text B1
2 Text C1
3 Text D1
4 Text E1
5 Text A2
6 Text B2
7 Text C2
8 Text D2
9 Text E2
10 Text A3
11 Text B3
12 Text C3
13 Text D3
14 Text E3
- * Added edit: Some of the above texts are blanks. There are no commas that I can do a str.split() and I'm not sure reshaping is the right way to go. The information repeats every 5 entries and I'm trying to separate them into columns so that it looks like:
Col1 Col2 Col3 Col4 Col5
0 Text A1 Text B1 Text C1 Text D1 Text E1
1 Text A2 Text B2 Text C2 Text D2 Text E2
2 Text A3 Text B3 Text C3 Text D3 Text E3
What is the pythonic way to reshape or split into 5 columns not relying on punctuation from the text?
TBH, if you know they repeat every 5, I would reshape:
In [36]: pd.DataFrame(df.values.reshape(-1, 5), columns=[f"Col {i}" for i in range(1,6)])
Out[36]:
Col 1 Col 2 Col 3 Col 4 Col 5
0 Text A1 Text B1 Text C1 Text D1 Text E1
1 Text A2 Text B2 Text C2 Text D2 Text E2
2 Text A3 Text B3 Text C3 Text D3 Text E3
Personally though I'm wary of missing values, and so I'd probably groupby on some function of the strings, e.g.
pd.concat([v.reset_index(drop=True)
for _, v in df.groupby(df["Col 1"].str.rstrip(string.digits))], axis=1)
or something.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With