merge multiple lines into single line by value of column

Question

I have a tab-delimited text file that is very large. Many lines in the file have the same value for one of the columns in the file. I want to put them into same line. For example:

a foo
a bar
a foo2
b bar
c bar2

After run the script it should become:

a foo;bar;foo2
b bar
c bar2

how can I do this in either a shell script or in Python?

thanks.

Sai · Accepted Answer

With awk you can try this

{   a[$1] = a[$1] ";" $2 }
END { for (item in a ) print item, a[item] }

So if you save this awk script in a file called awkf.awk and if your input file is ifile.txt, run the script

awk -f awkf.awk ifile.txt | sed 's/ ;/ /'

The sed script is to remove out the leading ;

Hope this helps

dugres · Answer

from collections import defaultdict

items = defaultdict(list)
for line in open('sourcefile'):
    key, val = line.split('	')
    items[key].append(val)

result = open('result', 'w')
for k in sorted(items):
    result.write('%s	%s
' % (k, ';'.join(items[k])))
result.close()

not tested

Scott · Answer

Tested with Python 2.7:

import csv

data = {}

reader = csv.DictReader(open('infile','r'),fieldnames=['key','value'],delimiter='	')
for row in reader:
    if row['key'] in data:
        data[row['key']].append(row['value'])
    else:
        data[row['key']] = [row['value']]

writer = open('outfile','w')
for key in data:
    writer.write(key + '	' + ';'.join(data[key]) + '
')
writer.close()

merge multiple lines into single line by value of column

Tags:

python

split

perl

Jianguo

3 Answers

Sai

dugres

Scott

Recent Activity

Donate For Us

merge multiple lines into single line by value of column

Tags:

python

split

perl

Jianguo

3 Answers

Sai

dugres

Scott

Related questions

Recent Activity

Donate For Us