Hey there, you Regex Lovers !
I'm quite in Regex, these times and had a purely theorical problem. To put it simple, I will present it as a game.
The game :
Let's say you have a list of words separated by spaces.
What I call a word is as they are defined by regular expressions : [a-zA-Z_0-9]+
(There is no empty word here)
Example of list :Horse Banana Joker RoXx0r A_Long_Word Joker 1337
What I want you to do is replace each word except Joker by a number of $ equal to the number of character of the matched word.
With our previous list we would obtain :$$$$$ $$$$$$ Joker $$$$$$ $$$$$$$$$$$ Joker $$$$
In fewer words : I want a regex that matches each character that does not belong to the word "Joker" (In the string, I mean, not that compose the word Joker)
While it is not easy, it's not impossible (I have my own regex for that). That's why I will set some rules.
The rules :
Added rules :
To help you out, here are some strings on which the regex must work :Horse Banana Joker RoXx0r A_Long_Word Joker 1337 Joke Poker Joker Jokers
Must return after replacement :$$$$$ $$$$$$ Joker $$$$$$ $$$$$$$$$$$ Joker $$$$ $$$$ $$$$$ Joker $$$$$$
Joker Joker Joker
Must return after replacement :Joker Joker Joker
Again, solving the problem is not the goal here, I want to see different solutions, and more importantly I want to see the best ones !
Solutions :
A very elegant one by Casimir et Hippolyte :(?:\G(?!^)|(?<!\S)(?!Joker(?:\s|$)))\S
(replace : $
)
See the post
However the \G take the fun out of the problem and does not work in every language, so I can't accept it unless is is possible to create a custom delimiter that is equivalent to \G
Almost accepted answer also by Casimir et Hippolyte :((?:\s+|\bJoker\b)*)\S((?:\s+Joker)*\s*$)?
(replace : $1$$2
)
See the post
Does not work when there are only Joker words in the string
A similar solution by ClasG :(\bJoker[^\w]+)\w|\w([^\w]+Joker\b)|\w
(replace : $1$$2
)
See the post
Does not work when there are only Joker words in the string
Another one by ClasG :[^Joker\s]|(?<!\b)J|J(?!oker\b)|(?<!\bJ)o|o(?!ker\b)|(?<!\bJo)k|k(?!er\b)|(?<!\bJok)e|e(?!r\b)|(?<!\bJoke)r|r(?!\b)
(replace : $
)
See the post
Not very efficient, though, but it's another way of seeing things ;)
I came up with a similar regex after reading the comment of Rahul below :(?(?<=\b|\bJ|\bJo|\bJok|\bJoke|\bJoker)(?!(?:Joke|oke|ke|e|)r\b)\w|\w)
(replace $
)
Regex101
It is also inefficient, but use the same lookaround list thing :)
Here is my first solution :
I use a trick that might be considered as cheating, but I don't because it would not alter the functions you use to replace characters. You just have to add a '$' at the end of the string before replacing charactes into it.
So instead of something like :string = replace(string, regex, '$1$2')
We would have :string = replace(string+'$', regex, '$1$2')
So here is the regex :(\bJoker\b)|.$|\w(?=.*(\$))
(replace : $1$2
)
Regex 101
This should work with all languages except those not supporting lookaheads (they are rather rare)
Keep posting new regex if you find ones, I want to see more ways to do it ! :)
For PCRE/Perl/Ruby/Java/.net
find:
(?:\G(?!^)|(?<!\S)(?!Joker(?!\S)))\S
replace:
$
demo
pattern details:
(?:
\G (?!^) # contigous to a previous match (but not at the start of the string)
| # OR
(?<!\S) # not preceded by a non white-space
(?!Joker(?!\S)) # not followed by the forbidden word
)
\S # a non-whitespace character
If your words are only composed of word characters, you can simplify the pattern playing with word and non-word boundaries: (?:\G\B|\b(?!Joker\b))\w
Other way (PCRE/Perl): without the \G
feature and with the backtracking control verb (*SKIP)
(need less steps):
\s*(?:Joker(?:\s+|$))*(*SKIP)\K.
To be clear (*SKIP)
is only useful when the string ends with the forbidden word or a whitespace. You can also replace it with (*COMMIT)
.
demo
or:
\bJoker\b(*SKIP)(*F)|\S
and with pypi python regex module (that has a word boundary for the start and one for the end of a word):
\mJoker\M(*SKIP)(*F)|\S
A one that works with Javascript (if there's something to replace only):
find:
((?:\s+|\bJoker\b)*)\S((?:\s+Joker)*\s*$)?
replace: (backreference to group1, escaped $, backreference to group2)
$1$$$2
demo
An other Javascript version that uses the y flag (that forces the matches to be contigous), but unfortunately this one isn't supported by Internet Explorer, Safari and mobile browsers except Firefox mobile:
var strs = ['Horse Banana Joker RoXx0r A_Long_Word Joker 1337 Joke Poker Joker', 'Joker Joker Joker'];
strs.forEach(function (s) {
console.log(s.replace(/(?=((?:\s+|\bJoker\b)*))\1./gy, '$1$$'));
});
The (?=(...))\1
emulates an atomic group (that forbids backtracking).
Can't really say why, but I wanted to see if I could make it without look-arounds. This is what I ended up with:
(\bJoker[^\w]+)\w|\w([^\w]+Joker\b)|\w
Substituting that with $1$$2
should do the trick.
It has one limitation though (that I thought of). It wont handle Joker
as a single word on the line :(. That's because the logic behind it is...
It matches the word Joker
in two alternations - either with a letter following it, or preceding it. In both cases separating the word from the letter by non letters (spaces). There is a third alternative as well - a single letter. If none of the two first matches, this will find non Joker-related letters.
In the first two cases, the word plus adjacent spaces (non-letters) get captured into a group (Joker
-space). Same goes for second alternative, but in reversed order (space-Joker
). The third alternative doesn't capture anything . it just matches a letter.
Replacing the complete match with $1$$2
(note the literal $
in the middle) either inserts the word Joker
plus spaces (if the first alternation matched) followed by a $
.
If the first didn't match, but the second did, the inserted replacement would be the $
plus captured spaces followed by Joker
.
If none of the two first matched, nothing is captured, and the only thing inserted will be the sole $
, replacing whatever letter matched.
See it here at regex101.
Edit:
Just noticed that Casimir et Hippolyte has a version at the end that's similar to mine. They're not identical though, so I'll leave my answer here for now ;)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With