Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Removing all duplicate entries in a field

Tags:

bash

sed

awk

I have a file that is of the following format:

text   number   number   A;A;A;A;A;A
text   number   number   B
text   number   number   C;C;C;C;D;C;C;C;C

What I want to do is remove all repeats of the entries in the fourth column to end up with this:

text   number   number   A
text   number   number   B
text   number   number   C;D

I'd prefer to use bash scripting for a solution to fit into a pipe with other text manipulation I'm doing to this file.

Thanks!

like image 986
JoshuaA Avatar asked Feb 19 '23 15:02

JoshuaA


2 Answers

can achieve this using awk. Split field 4 into an array using ; first

awk '{delete z; d=""; split($4,arr,";");for (k in arr) z[arr[k]]=k; for (l in z) d=d";"l; print($1,$2,$3,substr(d, 2))}' file_name
like image 106
iruvar Avatar answered Feb 23 '23 13:02

iruvar


This might work for you (GNU sed):

sed 's/.*\s/&\n/;h;s/.*\n//;:a;s/\(\([^;]\).*\);\2/\1/;ta;H;g;s/\n.*\n//' file
like image 20
potong Avatar answered Feb 23 '23 12:02

potong