I am trying to split a csv into multiple files based on two column values. For example,
Source file:
Header1 Header2 Header3
Alpha energy 0.1
Alpha energy 0.34
Beta energy_imbalance 0.66
Beta energy 0.7
Beta energy 0.1
Gamma energy_imbalance 0.3
Expected output:
Outfile1:
Header1 Header2 Header3
Alpha energy 0.1
Alpha energy 0.34
Outfile2:
Header1 Header2 Header3
Beta energy_imbalance 0.66
Outfile3:
Header1 Header2 Header3
Beta energy 0.7
Beta energy 0.1
Outfile4:
Header1 Header2 Header3
Gamma energy_imbalance 0.3
The following is what I started with:
filein = open('test.csv')
csvin = csv.DictReader(filein)
outputs = {}
for row in csvin:
primaryValue = row['Header1']
secondaryValue = row['Header2']
if primaryValue not in outputs:
fileout = open('{}_{}.csv'.format(primaryValue,secondaryValue),'w')
dw = csv.DictWriter(fileout, fieldnames=csvin.fieldnames)
dw.writeheader()
outputs[primaryValue] = fileout, dw
outputs[primaryValue][1].writerow(row)
for fileout, _ in outputs.values():
fileout.close()
I was able to split the file based on column = Header1, however I am not sure how to proceed further.
Here's how to implement in a manner along the lines of what @Barmar's suggested (i.e. using the two column values as a dictionary key). As shown, the key is used to look-up which csv.DictWriter instance that gets used to write the row — creating new ones as necessary. It also closes all the associated files that were opened at the end by using a separate list that keesp track of those.
import csv
infile_name = 'multicol_test.csv'
with open(infile_name, newline='') as infile:
csv_writers = {}
files = []
reader = csv.DictReader(infile)
for row in reader:
if (key := f"{row['Header1']}_{row['Header2']}") not in csv_writers:
# Create the csv file and a corresponding DictWriter.
outfile_name = f'{key}.csv'
fileout = open(outfile_name, 'w', newline='')
files.append(fileout) # To have it closed later.
writer = csv.DictWriter(fileout, fieldnames=reader.fieldnames)
writer.writeheader()
csv_writers[key] = writer
# Write the line to corresponding csv writer.
csv_writers[key].writerow(row)
# Close all CSV output files.
for f in files:
f.close()
Applied to the sample input file, this would produce the following csv output files:
Alpha_energy.csv
Beta_energy.csv
Beta_energy_imbalance.csv
Gamma_energy_imbalance.csv
with the data in them you expect.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With