Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Trying to understand .NET regular expressions

Tags:

c#

.net

regex

I've been doing a lot of reading on .NET regular expressions and I have developed a regular expression, that I can't make any sense of.

(src|href)="\w+|(\w+/)+

The way I read this regular expression:

  1. Match exactly "src" or "href"
  2. Followed by ="
  3. Followed by match 1 or more word characters ([a-zA-Z0-9_]) or one or more of (one or more word characters followed by /)

This is meant to match something like 'src="Folder', 'src="folder/', 'href="Folder/SubFolder/', etc.

Input:

<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml">
<head>

Using this regular expression, with this input, there is one match.

org/1999/

Can anyone possibly explain this? Src or href aren't referenced in the entire string, how can there be any match at all?

like image 571
Sam Rueby Avatar asked Jan 01 '26 11:01

Sam Rueby


1 Answers

What's happening here is the | is seperating the regex into two completely seperate conditions. That is select either: (src|href)="\w+ OR (\w+/)+ of which second bit is being matched:

org/1999/

In your case you'd probably need to put the last part in parentheses to make it clear what exactly the alternation | refers to:

(src|href)="(\w+|(\w+/)+)

Btw I used Expresso to help work this out.

like image 87
m.edmondson Avatar answered Jan 03 '26 04:01

m.edmondson



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!