Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regex: Match until first occurrence met

What I am trying is to match until first occurrence of & met. Right now it is matching only the last occurrence of &.

My regular expression is

(?!^)(http[^\\]+)\&

And I'm trying to match against this text:

https://www.google.com/url?rct3Dj&sa3Dt&url3Dhttp://business.itbusinessnet.com/article/WorldStage-Supports-Massive-4K-Video-Mapping-at-Adobe-MAX-with-Christie-Boxer-4K-Projectors---4820052&ct3Dga&cd3DCAEYACoTOTEwNTAyMzI0OTkyNzU0OTI0MjIaMTBmYTYxYzBmZDFlN2RlZjpjb206ZW46VVM&usg3DAFQjCNE6oIhIxR6qRMBmLkHOJTKLvamLFg

What I need is:

http://business.itbusinessnet.com/article/WorldStage-Supports-Massive-4K-Video-Mapping-at-Adobe-MAX-with-Christie-Boxer-4K-Projectors---4820052

Click for the codebase.

like image 281
AmazingDayToday Avatar asked Feb 21 '17 00:02

AmazingDayToday


People also ask

How do you match something before a word in regex?

A regular expression to match everything before a specific character makes use of a wildcard character and a capture group to store the matched value. Another method involves using a negated character class combined with an anchor.

How do you match a character sequence in regex?

Most characters, including all letters ( a-z and A-Z ) and digits ( 0-9 ), match itself. For example, the regex x matches substring "x" ; z matches "z" ; and 9 matches "9" . Non-alphanumeric characters without special meaning in regex also matches itself. For example, = matches "=" ; @ matches "@" .

What is a capturing group regex?

Capturing groups are a way to treat multiple characters as a single unit. They are created by placing the characters to be grouped inside a set of parentheses. For example, the regular expression (dog) creates a single group containing the letters "d" "o" and "g" .


1 Answers

Use the non-greedy mode like this:

/(?!^)(http[^\\]+?)&/
//               ^

In non-greedy mode (or lazy mode) the match will be as short as possible.

If you want to get rid ot the & then just wrap it in a lookahead group so it won't be in the match like this:

/(?!^)(http[^\\]+?)(?=&)/
//                 ^^  ^

Or you could optimize the regular expression as @apsillers suggested in the comment bellow like this:

/(?!^)(http[^\\&]+)/

Note: & is not a special character so you don't need to escape it,

like image 161
ibrahim mahrir Avatar answered Nov 15 '22 13:11

ibrahim mahrir