Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What regular expression can remove duplicate items from a string?

Given a string of identifiers separated by :, is it possible to construct a regular expression to extract the unique identifiers into another string, also separated by :?

How is it possible to achieve this using a regular expression? I have tried s/(:[^:])(.*)\1/$1$2/g with no luck, because the (.*) is greedy and skips to the last match of $1.

Example: a:b:c:d:c:c:x:c:c:e:e:f should give a:b:c:d:x:e:f

Note: I am coding in perl, but I would very much appreciate using a regex for this.

like image 359
Tom Avatar asked Jul 22 '10 14:07

Tom


3 Answers

In .NET which supports infinite repetition inside lookbehind, you could search for

(?<=\b\1:.*)\b(\w+):?

and replace all matches with the empty string.

Perl (at least Perl 5) only supports fixed-length lookbehinds, so you can try the following (using lookahead, with a subtly different result):

\b(\w+):(?=.*\b\1:?)

If you replace that with the empty string, all previous repetitions of a duplicate entry will be removed; the last one will remain. So instead of

a:b:c:d:x:e:f

you would get

a:b:d:x:c:e:f

If that is OK, you can use

$subject =~ s/\b(\w+):(?=.*\b\1:?)//g;

Explanation:

First regex:

(?<=\b\1:.*): Check if you can match the contents of backreference no. 1, followed by a colon, somewhere before in the string.

\b(\w+):?: Match an identifier (from a word boundary to the next :), optionally followed by a colon.

Second regex:

\b(\w+):: Match an identifier and a colon.

(?=.*\b\1:?): Then check whether you can match the same identifier, optionally followed by a colon, somewhere ahead in the string.

like image 140
Tim Pietzcker Avatar answered Sep 29 '22 11:09

Tim Pietzcker


Check out: http://www.regular-expressions.info/duplicatelines.html

Always a useful site when thinking about any regular expression.

like image 24
Noon Silk Avatar answered Sep 29 '22 11:09

Noon Silk


$str = q!a:b:c:d:c:c:x:c:c:e:e:f!;

1 while($str =~ s/(:[^:]+)(.*?)\1/$1$2/g);

say $str

output :

a:b:c:d:x:e:f
like image 24
Toto Avatar answered Sep 29 '22 13:09

Toto