Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Javascript regular expression: match anything up until something (if there it exists)

Tags:

I am new to regular expression and this may be a very easy question (hopefully).

I am trying to use one solution for 3 kinds of string

  • "45%", expected result: "45"
  • "45", expected result: "45"
  • "", expected result: ""

What I am trying (let the string be str):

str.match(/(.*)(?!%*)/i)[1] 

This is in my head would sound like "match any instance of anything up until '%' if it is found, or else just match anything"

In firebug's head, it seems to sound more like "just match anything and completely disregard the negative lookahead". Also to make it lazy - (.*)? - doesn't seem to help.

Let's forget for a second that in this specific situation I am only matching numbers, so a /\d*/ would do. I am trying to understand a general rule so that I can apply it whenever.

Anybody would be so kind to help me out?

like image 355
undefinederror Avatar asked Dec 21 '11 03:12

undefinederror


People also ask

How do you match anything up until this sequence of characters in regular expression?

If you add a * after it – /^[^abc]*/ – the regular expression will continue to add each subsequent character to the result, until it meets either an a , or b , or c . For example, with the source string "qwerty qwerty whatever abc hello" , the expression will match up to "qwerty qwerty wh" .

Does empty regex match everything?

An empty regular expression matches everything.

What does \+ mean in regex?

Example: The regex "aa\n" tries to match two consecutive "a"s at the end of a line, inclusive the newline character itself. Example: "a\+" matches "a+" and not a series of one or "a"s. ^ the caret is the anchor for the start of the string, or the negation symbol. Example: "^a" matches "a" at the start of the string.


2 Answers

How about the simpler

str.match(/[^%]*/i)[0] 

Which means, match zero-or-more character, which is not a %.


Edit: If need to parse until </a>, then you could parse a sequence pf characters, followed by </a>, then then discard the </a>, which means you should use positive look-ahead instead of negative.

str.match(/.*?(?=<\/a>|$)/i)[0] 

This means: match zero-or-more character lazily, until reaching a </a> or end of string.

Note that *? is a single operator, (.*)? is not the same as .*?.

(And don't parse HTML with a single regex, as usual.)

like image 87
kennytm Avatar answered Oct 29 '22 06:10

kennytm


I think this is what you're looking for:

/(?:(?!%).)*/ 

The . matches any character, but only after the negative lookahead, (?!%), confirms that the character is not %. Note that when the sentinel is a single character like %, you can use a negated character class instead, for example:

/[^%]*/ 

But for a multi-character sentinel like </a>, you have to use the lookahead approach:

/(?:(?!</a>).)*/i 

This is actually saying "Match zero or more characters one at a time, but if the next character turns out to be the beginning of the sequence </a> or </A>, stop without consuming it".

like image 28
Alan Moore Avatar answered Oct 29 '22 06:10

Alan Moore