I have two csvs, I want to combine or merge these csvs as left join... my key column is "id", I have same non-key column as "result" in both csvs, but I want to override "result" column if any value exists in "result" column of 2nd CSV . How can I achieve that using pandas or any scripting lang. Please see my final expected output.
input.csv:
id,scenario,data1,data2,result
1,s1,300,400,"{s1,not added}"
2,s2,500,101,"{s2 added}"
3,s3,600,202,
output.csv:
id,result
1,"{s1,added}"
3,"{s3,added}"
final_output.csv
id,scenario,data1,data2,result
1,s1,300,400,"{s1,added}"
2,s2,500,101,"{s2 added}"
3,s3,600,202,"{s3,added}"
import pandas as pd
a = pd.read_csv("input.csv")
b = pd.read_csv("output.csv")
merged = a.merge(b, on='test_id',how='left')
merged.to_csv("final_output.csv", index=False)
Using this code I am getting the result column twice. I want only once and it should override if value exists in that column. How do I get a single result column?
Practical Data Science using Python To merge all CSV files, use the GLOB module. The os. path. join() method is used inside the concat() to merge the CSV files together.
If the merged CSV is going to be used in Python then just use glob to get a list of the files to pass to fileinput. input() via the files argument, then use the csv module to read it all in one go. Show activity on this post. This will also copy the header line of the files for each file.
try this, this works as well
import pandas as pd
import numpy as np
c=pd.merge(a,b,on='id',how='left')
lst=[]
for i in c.index:
if(c.iloc[i]['result_x']!=''):
lst.append(c.iloc[i]['result_x'])
else:
lst.append(c.iloc[i]['result_y'])
c['result']=pd.Series(lst)
del c['result_x']
del c['result_y']
This will combine the columns as desired:
import pandas as pd
a = pd.read_csv("input.csv")
b = pd.read_csv("output.csv")
merged = a.merge(b, on='id', how='outer')
def merge_results(row):
y = row['result_y']
return row['result_x'] if isinstance(y, float) else y
merged['result'] = merged.apply(merge_results, axis=1)
del merged['result_x']
del merged['result_y']
merged.to_csv("final_output.csv", index=False)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With