Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python Joining csv files where key is first column value

Tags:

python

csv

I try to join two csv files where key is value of first column. There's no header.
Files have different number of lines and rows.
Order of file a must be preserved.

file a:

john,red,34
andrew,green,18
tonny,black,50
jack,yellow,27
phill,orange,45
kurt,blue,29
mike,pink,61

file b:

tonny,driver,new york
phill,scientist,boston

desired result:

john,red,34
andrew,green,18
tonny,black,50,driver,new york
jack,yellow,27
phill,orange,45,scientist,boston
kurt,blue,29
mike,pink,61

I examined all related threads and I am sure that some of you are gonna mark this question duplicate but I simply have not found solution yet.

I grabbed dictionary based solution but this approach does not handle preserve line order from file 'a' condition.

import csv
from collections import defaultdict

with open('a.csv') as f:
    r = csv.reader(f, delimiter=',')
    dict1 = {}
    for row in r:
        dict1.update({row[0]: row[1:]})

with open('b.csv') as f:
    r = csv.reader(f, delimiter=',')
    dict2 = {}
    for row in r:
        dict2.update({row[0]: row[1:]})

result = defaultdict(list)

for d in (dict1, dict2):
    for key, value in d.iteritems():
        result[key].append(value)

I also would like to avoid putting these csv files to the database like sqlite or using pandas module.

Thanks in advance

like image 404
daikini Avatar asked Dec 06 '13 01:12

daikini


People also ask

How do I merge CSV files in the same column in Python?

To merge all CSV files, use the GLOB module. The os. path. join() method is used inside the concat() to merge the CSV files together.


1 Answers

Something like

import csv
from collections import OrderedDict

with open('b.csv', 'rb') as f:
    r = csv.reader(f)
    dict2 = {row[0]: row[1:] for row in r}

with open('a.csv', 'rb') as f:
    r = csv.reader(f)
    dict1 = OrderedDict((row[0], row[1:]) for row in r)

result = OrderedDict()
for d in (dict1, dict2):
    for key, value in d.iteritems():
        result.setdefault(key, []).extend(value)

with open('ab_combined.csv', 'wb') as f:
    w = csv.writer(f)
    for key, value in result.iteritems():
        w.writerow([key] + value)

produces

john,red,34
andrew,green,18
tonny,black,50,driver,new york
jack,yellow,27
phill,orange,45,scientist,boston
kurt,blue,29
mike,pink,61

(Note that I didn't bother protecting against the case where dict2 has a key which isn't in dict1-- that's easily added if you like.)

like image 150
DSM Avatar answered Nov 03 '22 09:11

DSM