Is it possible to use a regular expression to match all words but match unique words only once? I am aware there are other ways of doing this however I'm interested in knowing if this is possible with the use of a regular expression.
For example I currently have the following expression:
(\w+\b)(?!.*\1)
and the following string:
glass shoes door window door glasses. window glasses
For the most part the expression works and matches the following words:
shoes
door
window
glasses
There are two issues with this:
A match for a substring is being made on "glasses" with "glass", this is incorrect.
"glasses" and "glasses." should match but currently do not.
The final match should be:
shoes
door
window
glasses
glass
If you want . to match really everything, including newlines, you need to enable "dot-matches-all" mode in your regex engine of choice (for example, add re. DOTALL flag in Python, or /s in PCRE.
But if you wish to match an exact word the more elegant way is to use '\b'. In this case following pattern will match the exact phrase'123456'.
\/ is an escape, forward slash. The escape says the forward slash isn't a control character, but that you instead actually want a literal forward slash. . matches any one character, and the following + says "One or more of whatever immediately preceded this".
For search distinct words in multiline text use [\s\S]
instead of .
(\b\w+\b)(?![\s\S]*\b\1\b)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With