I'm trying to understand this regex, can you help me out?
(?s)\\{\\{wotd\\|(.+?)\\|(.+?)\\|([^#\\|]+).*?\\}\\}
(?s)
\\
before }
? (.+?)
(should we read this like : the .
, then +
acting on the .
, then ?
responding to the result of .+
?This regex is from a string. The "canonical" regex is:
(?s)\{\{wotd\|(.+?)\|(.+?)\|([^#\|]+).*?\}\}
The DOTALL modifier means that the dot can also match a newline character, but so can complemented character classes, at least with Java: ie [^a]
will match each and every character which is not a
, newline included. Some regex engines do NOT match a newline in complemented character classes though (this can be regarded as a bug).
The +?
and *?
are lazy quantifiers (which should generally be avoided). It means that they will have to look forward before each character they want to swallow to see if this character can satisfy the next component of a regex.
The fact that {
and }
are preceded with \
is because {...} is the repetition quantifier {n,m} where n and m are integers.
Also, it is useless to escape the pipe |
in the character class [^#\|]
, it can be simply written as [^#|]
.
And finally, .*?
at the end seems to swallow the rest of the fields. A better alternative is to use the normal* (special normal*)*
pattern, where normal
is [^|}]
and special
is \|
.
Here is the regex without using lazy quantifiers, the "fixed" character class and the modified end. Note that the DOTALL modifier has disappeared as well, since the dot isn't used anymore:
\{\{wotd\|([^|]+)\|([^|]+)\|([^#|]+)[^|}]*(?:\|[^|}]*)*\}\}
Step by step:
\{\{ # literal "{{", followed by
wotd # literal "wotd", followed by
\| # literal "|", followed by
([^|]+) # one or more characters which are not a "|" (captured), followed by
\| # literal "|", followed by
([^|]+) # one or more characters which are not a "|" (captured), followed by
\| # literal "|", followed by
([^#|]+) # one or more characters which are not "|" or "#", followed by
[^|}]* # zero or more characters which are not "|" or "}", followed by
(?: # begin group
\| # a literal "|", followed by
[^|}]* # zero or more characters which are not "|" or "}"
) # end group
* # zero or more times, followed by
\}\} # literal "}}"
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With