I have the following input file in csv
A,B,C,D
1,2,|3|4|5|6|7|8,9
11,12,|13|14|15|16|17|18,19
How do I split column C right in the middle into two new rows with additional column E where the first half of the split get "0" in Column E and the second half get "1" in Column E?
A,B,C,D,E
1,2,|3|4|5,9,0
1,2,|6|7|8,9,1
11,12,|13|14|15,19,0
11,12,|16|17|18,19,1
Thank you
Here's how to do it without Pandas:
import csv
with open("input.csv", newline="") as f_in, open("output.csv", "w", newline="") as f_out:
reader = csv.reader(f_in)
header = next(reader) # read header
header += ["E"] # modify header
writer = csv.writer(f_out)
writer.writerow(header)
for row in reader:
a, b, c, d = row # assign 4 items for each row
c_items = [x.strip() for x in c.split("|") if x.strip()]
n_2 = len(c_items) // 2 # halfway index
c1 = "|" + "|".join(c_items[:n_2])
c2 = "|" + "|".join(c_items[n_2:])
writer.writerow([a, b, c1, d, 0]) # 0 & 1 will be converted to str on write
writer.writerow([a, b, c2, d, 1])
If I understand you correctly, you can use str.split on column 'C', then .explode() the column and join it again:
df["C"] = df["C"].apply(
lambda x: [
(vals := x.strip(" |").split("|"))[: len(vals) // 2],
vals[len(vals) // 2 :],
]
)
df["E"] = df["C"].apply(lambda x: range(len(x)))
df = df.explode(["C", "E"])
df["C"] = "|" + df["C"].apply("|".join)
print(df.to_csv(index=False))
Prints:
A,B,C,D,E
1,2,|3|4|5,9,0
1,2,|6|7|8,9,1
11,12,|13|14|15,19,0
11,12,|16|17|18,19,1
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With