Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

combine/merge two csv using pandas/python

Tags:

I have two csvs, I want to combine or merge these csvs as left join... my key column is "id", I have same non-key column as "result" in both csvs, but I want to override "result" column if any value exists in "result" column of 2nd CSV . How can I achieve that using pandas or any scripting lang. Please see my final expected output.

Input

input.csv:

id,scenario,data1,data2,result
1,s1,300,400,"{s1,not added}"
2,s2,500,101,"{s2 added}"
3,s3,600,202,

output.csv:

id,result
1,"{s1,added}"
3,"{s3,added}"

Expected Output

final_output.csv

id,scenario,data1,data2,result
1,s1,300,400,"{s1,added}"
2,s2,500,101,"{s2 added}"
3,s3,600,202,"{s3,added}"

Current Code:

import pandas as pd

a = pd.read_csv("input.csv")
b = pd.read_csv("output.csv")
merged = a.merge(b, on='test_id',how='left')
merged.to_csv("final_output.csv", index=False)

Question:

Using this code I am getting the result column twice. I want only once and it should override if value exists in that column. How do I get a single result column?

like image 922
Madhura Mhatre Avatar asked Jan 16 '17 04:01

Madhura Mhatre


People also ask

How do I merge two csv files in pandas?

Practical Data Science using Python To merge all CSV files, use the GLOB module. The os. path. join() method is used inside the concat() to merge the CSV files together.

How do I merge 200 CSV files in Python?

If the merged CSV is going to be used in Python then just use glob to get a list of the files to pass to fileinput. input() via the files argument, then use the csv module to read it all in one go. Show activity on this post. This will also copy the header line of the files for each file.


2 Answers

try this, this works as well

import pandas as pd
import numpy as np
c=pd.merge(a,b,on='id',how='left')
lst=[]
for i in c.index:
    if(c.iloc[i]['result_x']!=''):
         lst.append(c.iloc[i]['result_x'])
    else:
         lst.append(c.iloc[i]['result_y'])
c['result']=pd.Series(lst)
del c['result_x']
del c['result_y']
like image 177
Mahesh Avatar answered Oct 11 '22 13:10

Mahesh


This will combine the columns as desired:

import pandas as pd

a = pd.read_csv("input.csv")
b = pd.read_csv("output.csv")
merged = a.merge(b, on='id', how='outer')

def merge_results(row):
    y = row['result_y']
    return row['result_x'] if isinstance(y, float) else y

merged['result'] = merged.apply(merge_results, axis=1)
del merged['result_x']
del merged['result_y']

merged.to_csv("final_output.csv", index=False)
like image 37
Stephen Rauch Avatar answered Oct 11 '22 12:10

Stephen Rauch