I have a file that is of the following format:
text number number A;A;A;A;A;A
text number number B
text number number C;C;C;C;D;C;C;C;C
What I want to do is remove all repeats of the entries in the fourth column to end up with this:
text number number A
text number number B
text number number C;D
I'd prefer to use bash scripting for a solution to fit into a pipe with other text manipulation I'm doing to this file.
Thanks!
can achieve this using awk
. Split field 4 into an array using ; first
awk '{delete z; d=""; split($4,arr,";");for (k in arr) z[arr[k]]=k; for (l in z) d=d";"l; print($1,$2,$3,substr(d, 2))}' file_name
This might work for you (GNU sed):
sed 's/.*\s/&\n/;h;s/.*\n//;:a;s/\(\([^;]\).*\);\2/\1/;ta;H;g;s/\n.*\n//' file
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With