Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

merge multiple lines into single line by value of column

Tags:

python

split

perl

I have a tab-delimited text file that is very large. Many lines in the file have the same value for one of the columns in the file. I want to put them into same line. For example:

a foo
a bar
a foo2
b bar
c bar2

After run the script it should become:

a foo;bar;foo2
b bar
c bar2

how can I do this in either a shell script or in Python?

thanks.

like image 699
Jianguo Avatar asked Jun 15 '11 14:06

Jianguo


3 Answers

With awk you can try this

{   a[$1] = a[$1] ";" $2 }
END { for (item in a ) print item, a[item] }

So if you save this awk script in a file called awkf.awk and if your input file is ifile.txt, run the script

awk -f awkf.awk ifile.txt | sed 's/ ;/ /'

The sed script is to remove out the leading ;

Hope this helps

like image 150
Sai Avatar answered Nov 19 '22 01:11

Sai


from collections import defaultdict

items = defaultdict(list)
for line in open('sourcefile'):
    key, val = line.split('\t')
    items[key].append(val)

result = open('result', 'w')
for k in sorted(items):
    result.write('%s\t%s\n' % (k, ';'.join(items[k])))
result.close()  

not tested

like image 2
dugres Avatar answered Nov 19 '22 01:11

dugres


Tested with Python 2.7:

import csv

data = {}

reader = csv.DictReader(open('infile','r'),fieldnames=['key','value'],delimiter='\t')
for row in reader:
    if row['key'] in data:
        data[row['key']].append(row['value'])
    else:
        data[row['key']] = [row['value']]

writer = open('outfile','w')
for key in data:
    writer.write(key + '\t' + ';'.join(data[key]) + '\n')
writer.close()
like image 1
Scott Avatar answered Nov 19 '22 02:11

Scott