I have a tab-delimited text file that is very large. Many lines in the file have the same value for one of the columns in the file. I want to put them into same line. For example:
a foo
a bar
a foo2
b bar
c bar2
After run the script it should become:
a foo;bar;foo2
b bar
c bar2
how can I do this in either a shell script or in Python?
thanks.
With awk you can try this
{ a[$1] = a[$1] ";" $2 }
END { for (item in a ) print item, a[item] }
So if you save this awk script in a file called awkf.awk and if your input file is ifile.txt, run the script
awk -f awkf.awk ifile.txt | sed 's/ ;/ /'
The sed script is to remove out the leading ;
Hope this helps
from collections import defaultdict
items = defaultdict(list)
for line in open('sourcefile'):
key, val = line.split('\t')
items[key].append(val)
result = open('result', 'w')
for k in sorted(items):
result.write('%s\t%s\n' % (k, ';'.join(items[k])))
result.close()
not tested
Tested with Python 2.7:
import csv
data = {}
reader = csv.DictReader(open('infile','r'),fieldnames=['key','value'],delimiter='\t')
for row in reader:
if row['key'] in data:
data[row['key']].append(row['value'])
else:
data[row['key']] = [row['value']]
writer = open('outfile','w')
for key in data:
writer.write(key + '\t' + ';'.join(data[key]) + '\n')
writer.close()
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With