Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to remove duplicate words from a string in a Bash script?

Tags:

bash

I have a string containing duplicate words, for example:

abc, def, abc, def

How can I remove the duplicates? The string that I need is:

abc, def
like image 634
Thanh Tran Avatar asked May 18 '15 04:05

Thanh Tran


People also ask

How do I remove duplicates in bash?

Remove duplicate lines with uniq The sort command sorts the lines in alphanumeric order. The uniq command ensures that sequential identical lines are reduced to one.

How do you remove duplicates in string?

We can remove the duplicate characters from a string by using the simple for loop, sorting, hashing, and IndexOf() method.

How do I remove duplicates from a text file in Linux?

To remove duplicate lines from a sorted file and make it unique, we use the uniq command in the Linux system. The uniq command work as a kind of filter program that reports out the duplicate lines in a file. It filters adjacent matching lines from the input and gives a unique output.


2 Answers

We have this test file:

$ cat file
abc, def, abc, def

To remove duplicate words:

$ sed -r ':a; s/\b([[:alnum:]]+)\b(.*)\b\1\b/\1\2/g; ta; s/(, )+/, /g; s/, *$//' file
abc, def

How it works

  • :a

    This defines a label a.

  • s/\b([[:alnum:]]+)\b(.*)\b\1\b/\1\2/g

    This looks for a duplicated word consisting of alphanumeric characters and removes the second occurrence.

  • ta

    If the last substitution command resulted in a change, this jumps back to label a to try again.

    In this way, the code keeps looking for duplicates until none remain.

  • s/(, )+/, /g; s/, *$//

    These two substitution commands clean up any left over comma-space combinations.

Mac OSX or other BSD System

For Mac OSX or other BSD system, try:

sed -E -e ':a' -e 's/\b([[:alnum:]]+)\b(.*)\b\1\b/\1\2/g' -e 'ta' -e 's/(, )+/, /g' -e 's/, *$//' file

Using a string instead of a file

sed easily handles input either from a file, as shown above, or from a shell string as shown below:

$ echo 'ab, cd, cd, ab, ef' | sed -r ':a; s/\b([[:alnum:]]+)\b(.*)\b\1\b/\1\2/g; ta; s/(, )+/, /g; s/, *$//'
ab, cd, ef
like image 152
John1024 Avatar answered Sep 18 '22 15:09

John1024


You can use awk to do this.

Example:

#!/bin/bash
string="abc, def, abc, def"
string=$(printf '%s\n' "$string" | awk -v RS='[,[:space:]]+' '!a[$0]++{printf "%s%s", $0, RT}')
string="${string%,*}"
echo "$string"

Output:

abc, def
like image 44
Jahid Avatar answered Sep 20 '22 15:09

Jahid