Given a string of identifiers separated by :
, is it possible to construct a regular expression to extract the unique identifiers into another string, also separated by :
?
How is it possible to achieve this using a regular expression? I have tried s/(:[^:])(.*)\1/$1$2/g
with no luck, because the (.*)
is greedy and skips to the last match of $1
.
Example: a:b:c:d:c:c:x:c:c:e:e:f
should give a:b:c:d:x:e:f
Note: I am coding in perl, but I would very much appreciate using a regex for this.
In .NET which supports infinite repetition inside lookbehind, you could search for
(?<=\b\1:.*)\b(\w+):?
and replace all matches with the empty string.
Perl (at least Perl 5) only supports fixed-length lookbehinds, so you can try the following (using lookahead, with a subtly different result):
\b(\w+):(?=.*\b\1:?)
If you replace that with the empty string, all previous repetitions of a duplicate entry will be removed; the last one will remain. So instead of
a:b:c:d:x:e:f
you would get
a:b:d:x:c:e:f
If that is OK, you can use
$subject =~ s/\b(\w+):(?=.*\b\1:?)//g;
Explanation:
First regex:
(?<=\b\1:.*)
: Check if you can match the contents of backreference no. 1, followed by a colon, somewhere before in the string.
\b(\w+):?
: Match an identifier (from a word boundary to the next :
), optionally followed by a colon.
Second regex:
\b(\w+):
: Match an identifier and a colon.
(?=.*\b\1:?)
: Then check whether you can match the same identifier, optionally followed by a colon, somewhere ahead in the string.
Check out: http://www.regular-expressions.info/duplicatelines.html
Always a useful site when thinking about any regular expression.
$str = q!a:b:c:d:c:c:x:c:c:e:e:f!;
1 while($str =~ s/(:[^:]+)(.*?)\1/$1$2/g);
say $str
output :
a:b:c:d:x:e:f
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With