Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to validate the last character in a capture group with regex

I want to extract the path out of a URL and I want to use regex for it.

I'm using this regex: ^(?:https?:\/\/)?(?:[^@\n]+@)?(?:www\.)?([^:\n\?\=ֿ\#]+)

and one side effect that the last / is captured.

e.g. -

domain.com/home/ = domain.com/home/
domain.com/home?param=value = domain.com/home

how do I validate that the last character of a specific capture group is not /?

note - I know I can solve this with another regex match, but I assume it can be done with one.

like image 564
Matan Sanbira Avatar asked Oct 22 '20 10:10

Matan Sanbira


People also ask

How do I capture a character in regex?

Regular expressions allow us to not just match text but also to extract information for further processing. This is done by defining groups of characters and capturing them using the special parentheses ( and ) metacharacters. Any subpattern inside a pair of parentheses will be captured as a group.

Which character is used to indicate the end of string regex?

The caret ^ and dollar $ characters have special meaning in a regexp. They are called “anchors”. The caret ^ matches at the beginning of the text, and the dollar $ – at the end.

Which method is used to test match in string regex?

The Match(String, String, RegexOptions) method returns the first substring that matches a regular expression pattern in an input string. For information about the language elements used to build a regular expression pattern, see Regular Expression Language - Quick Reference.

What does the plus character [+] do in regex?

The plus ( + ) is a quantifier that matches one or more occurrences of the preceding element. The plus is similar to the asterisk ( * ) in that many occurrences are acceptable, but unlike the asterisk in that at least one occurrence is required.


Video Answer


1 Answers

One way could be adding the / to the negated character class to not match it, and only match it when it is followed by any char other than / or a whitespace char.

^(?:https?:\/\/)?(?:[^@\n]+@)?(?:www\.)?((?:[^:\n?=ֿ#\/]+|\/(?=[^\s\/])[^:\n?=ֿ#\/]*)*)

The last part will match

  • ( Capture group 1
    • (?: Non capture group
      • [^:\n?=ֿ#\/]+ Match any char except the listed including /
      • | Or
      • \/(?=[^\s\/]) match / when directly followed by any char other then / or a whitespace char
      • [^:\n?=ֿ#\/]* Match optional chars other than the listed
    • )* Close non capture group and repeat 0+ times to match multiple /
  • ) Close group 1

Regex demo

like image 64
The fourth bird Avatar answered Sep 22 '22 14:09

The fourth bird