Imagine this is a part of a large text:
stuff (word1/Word2/w0rd3) stuff, stuff (word4/word5) stuff/stuff (word6) stuff (word7/word8/word9) stuff / stuff, (w0rd10/word11) stuff stuff (word12) stuff (Word13/w0rd14/word15) stuff-stuff stuff (word16/word17).
I want the words. The result must matches:
word1
Word2
w0rd3
word4
word5
word6
word7
word8
word9
w0rd10
word11
word12
Word13
w0rd14
word15
word16
word17
Also the result should not be like:
(word1) or (word1/Word2/w0rd3)
Basically no ( or ) or / allowed
What i have tried:
\((\w+)\/(\w+)\/(\w+)\)[^(]*\((\w+)\/(\w+)\)[^(]*\((\w+)\)
regex101
This matches those words but i have to duplicate it as many word exist which is not clean. Also i tried txt2re but it is duplicated as well and it is not a one line regex. In case i want to use it on a online regex evaluator and no coding is in access then i need a one line and short regex. And my preferred engine is Python and C#.
Update:
I have added some /
in the text. Also sorry for changing the accepted answer, All answers are correct in some way, But i have to choose the fastest and most efficient regex here.
A common solution is to check, if there is a closing )
ahead without any opening (
in between.
\w+\b(?=[^)(]*\))
See this demo at regex101
\w+
matches one or more word characters, followed by a \b
word boundary
(?=[^)(]*\))
look if closing )
is ahead with any non (
)
in betweenSo this pattern does not check for an opening (
before, but often that's not needed.
Instead of matching the words, you can write a regex that matches the non-words, and split by the regex:
\)?[^)]+?\(|\).+|/
A non-word is either:
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With