Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regular expression - match all words but match unique words only once

Tags:

regex

Is it possible to use a regular expression to match all words but match unique words only once? I am aware there are other ways of doing this however I'm interested in knowing if this is possible with the use of a regular expression.

For example I currently have the following expression:

(\w+\b)(?!.*\1)

and the following string:

glass shoes door window door glasses. window glasses

For the most part the expression works and matches the following words:

shoes
door 
window
glasses

There are two issues with this:

  1. A match for a substring is being made on "glasses" with "glass", this is incorrect.

  2. "glasses" and "glasses." should match but currently do not.

The final match should be:

shoes 
door 
window 
glasses 
glass 
like image 710
Isomorph Avatar asked Dec 27 '12 21:12

Isomorph


People also ask

How do you match everything after a word in regex?

If you want . to match really everything, including newlines, you need to enable "dot-matches-all" mode in your regex engine of choice (for example, add re. DOTALL flag in Python, or /s in PCRE.

How do you match a word exactly in regular expression?

But if you wish to match an exact word the more elegant way is to use '\b'. In this case following pattern will match the exact phrase'123456'.

What does \/ mean in regex?

\/ is an escape, forward slash. The escape says the forward slash isn't a control character, but that you instead actually want a literal forward slash. . matches any one character, and the following + says "One or more of whatever immediately preceded this".


1 Answers

For search distinct words in multiline text use [\s\S] instead of .

(\b\w+\b)(?![\s\S]*\b\1\b)
like image 110
Andrey Lavrukhin Avatar answered Oct 15 '22 10:10

Andrey Lavrukhin