I've been doing a lot of reading on .NET regular expressions and I have developed a regular expression, that I can't make any sense of.
(src|href)="\w+|(\w+/)+
The way I read this regular expression:
This is meant to match something like 'src="Folder', 'src="folder/', 'href="Folder/SubFolder/', etc.
Input:
<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
Using this regular expression, with this input, there is one match.
org/1999/
Can anyone possibly explain this? Src or href aren't referenced in the entire string, how can there be any match at all?
What's happening here is the | is seperating the regex into two completely seperate conditions. That is select either: (src|href)="\w+ OR (\w+/)+ of which second bit is being matched:
org/1999/
In your case you'd probably need to put the last part in parentheses to make it clear what exactly the alternation | refers to:
(src|href)="(\w+|(\w+/)+)
Btw I used Expresso to help work this out.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With