Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regex to match the first occurrence of a string

Tags:

I have this string:

City - This is some text. This is some more - and continues here.

I would like to split the string at the first ' - ' to find 'city' (just a sample word, it can be other words as well). Plus to find the rest of the text after ' - '.

I constructed this expression:

(^[\D\W\S]*)( - )([\D\W\S]*) 

But this finds the last occurrence of ' - ' instead of the first one.

How can I stop at the first occurrence ?

like image 589
Gijs Avatar asked May 12 '12 20:05

Gijs


People also ask

What does regex 0 * 1 * 0 * 1 * Mean?

Basically (0+1)* mathes any sequence of ones and zeroes. So, in your example (0+1)*1(0+1)* should match any sequence that has 1. It would not match 000 , but it would match 010 , 1 , 111 etc. (0+1) means 0 OR 1. 1* means any number of ones.

What does '$' mean in regex?

$ means "Match the end of the string" (the position after the last character in the string). Both are called anchors and ensure that the entire string is matched instead of just a substring.

How do you match something before a word in regex?

Take this regular expression: /^[^abc]/ . This will match any single character at the beginning of a string, except a, b, or *c. If you add a * after it – /^[^abc]*/ – the regular expression will continue to add each subsequent character to the result, until it meets either an a , or b , or c .

How do you match a character sequence in regex?

To match a character having special meaning in regex, you need to use a escape sequence prefix with a backslash ( \ ). E.g., \. matches "." ; regex \+ matches "+" ; and regex \( matches "(" . You also need to use regex \\ to match "\" (back-slash).


1 Answers

The simplest solution would be to explicitly forbid the dash to be part of the first group:

^([^-]*) - (.*) 

Explanation:

^        # Start of string ([^-]*)  # Match any number of characters except dashes \ - \    # Match a dash (surrounded by spaces) (.*)     # Match anything that follows 

However, this would fail if your string could contain a dash in the first group (just not surrounded by spaces). If that's the case, then you can make use of lazy quantifiers:

^(.*?) - (.*) 

Explanation:

^        # Start of string (.*?)    # Match any number of characters, as few as possible \ - \    # Match a dash (surrounded by spaces) (.*)     # Match anything that follows 
like image 102
Tim Pietzcker Avatar answered Sep 22 '22 03:09

Tim Pietzcker