Removing all duplicate entries in a field

Question

I have a file that is of the following format:

text   number   number   A;A;A;A;A;A
text   number   number   B
text   number   number   C;C;C;C;D;C;C;C;C

What I want to do is remove all repeats of the entries in the fourth column to end up with this:

text   number   number   A
text   number   number   B
text   number   number   C;D

I'd prefer to use bash scripting for a solution to fit into a pipe with other text manipulation I'm doing to this file.

Thanks!

iruvar · Accepted Answer

can achieve this using awk. Split field 4 into an array using ; first

awk '{delete z; d=""; split($4,arr,";");for (k in arr) z[arr[k]]=k; for (l in z) d=d";"l; print($1,$2,$3,substr(d, 2))}' file_name

potong · Answer

This might work for you (GNU sed):

sed 's/.*\s/&\n/;h;s/.*\n//;:a;s/$\([^;]$.*\);\2/\1/;ta;H;g;s/\n.*\n//' file

Donate For Us