Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regex URL pattern except specific subsite

Tags:

c#

regex

I am working on a webcrawler, where I am trying to make a regex to support the following.

Match: all pages starting with

   http://intranet/

But not starting with

    http://intranet/sites/ and http://intranet/search/

And in the subfolder /Pages/ Ending with .aspx

Valid sample: 
http://intranet/products/Pages/default.aspx
Invalid samples:
http://intranet/Pages/sofus/default.aspx
http://intranet/sites/products/Pages/default.aspx
http://intranet/products/Pages/default.aspx#

So far I have made this

 ^http://intranet.*/Pages/.*.aspx+

Any help appreciated.

like image 429
CADmageren Avatar asked Mar 04 '26 09:03

CADmageren


1 Answers

A pattern like this should work:

^http://intranet/(?!sites|search)[^/]+/Pages/.*\.aspx$

The (?!...) creates what's known as a negative lookahead assertion and ensure that the [^/]+ does not start with sites or search.

Here's a demonstration.

like image 197
p.s.w.g Avatar answered Mar 05 '26 22:03

p.s.w.g



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!