Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

regular expressions difference between ((?:[^\"])*) and ([^\"]*)

what is the difference between this regular expressions are the replaceable?

((?:[^\"])*)


([^\"]*)

background to this question:

The javascript WYSIWYG editor (tinymce) fails to parse my html code in Firefox (23.0.1 and 25.0a2) but works in in Chrome.

I found the regular expression to blame:

attrRegExp = /([\w:\-]+)(?:\s*=\s*(?:(?:\"((?:[^\"])*)\")|(?:\'((?:[^\'])*)\')|([^>\s]+)))?/g;

which I modified, replacing

((?:[^\"])*) 

with

([^\"]*)

and

((?:[^\'])*) 

with

([^\']*)

the resulting regular expression is working in both browsers for my test case

attrRegExp = /([\w:\-]+)(?:\s*=\s*(?:(?:\"([^\"]*)\")|(?:\'([^\']*)\')|([^>\s]+)))?/g

can someone put some light on that?

my test data that only works with the modified regular expression is a big image >700 kb like:

var testdata = '<img alt="" src="data:image/jpeg;base64,/9j/4AAQSkZJRgA...5PmDk4FOGOHy6S3JW120W1uCJ5M0PBa54edOFAc8ePX/2Q==">'

doing something like that to test:

testdata.match(attrRegExp);

especially when the test data is big the unmodified regex is likely to fail in firefox.

You can find the jsfiddle example here:

like image 365
key_ Avatar asked Sep 13 '13 13:09

key_


People also ask

What is the difference between .*? and * regular expressions?

*1 , * is greedy - it will match all the way to the end, and then backtrack until it can match 1 , leaving you with 1010000000001 . . *? is non-greedy. * will match nothing, but then will try to match extra characters until it matches 1 , eventually matching 101 .

What is the difference between and * regex?

Each of them are quantifiers, the star quantifier( * ) means that the preceding expression can match zero or more times it is like {0,} while the plus quantifier( + ) indicate that the preceding expression MUST match at least one time or multiple times and it is the same as {1,} .

What does ?= * Mean in regex?

?= is a positive lookahead, a type of zero-width assertion. What it's saying is that the captured match must be followed by whatever is within the parentheses but that part isn't captured. Your example means the match needs to be followed by zero or more characters and then a digit (but again that part isn't captured).

What is difference between wildcard and regular expression?

Wildcards are different from the regular expressions used in grep (although they may look similar at times). Wildcards apply to all commands including grep and are used in place of or in combination with operands. Regular Expressions only apply to grep and a few other UNIX commands.


1 Answers

There should be no difference in the result. So you should be fine.

However, there might be a big difference in how RegExp engines will process these two expressions, and in the case of Firefox/Safari you just proved there actually is ;)

Firefox makes use of WebKit/JavaScriptCore YARR. YARR imposes an arbitrary, artificial limit, which hits in the non-capturing group variant

// The below limit restricts the number of "recursive" match calls in order to
// avoid spending exponential time on complex regular expressions.
static const unsigned matchLimit = 1000000;

As such Safari is affected as well.

See the relevant Webkit bug and relevant Firefox bug and the nice test case comparing different expression types somebody put together.

like image 170
nmaier Avatar answered Oct 11 '22 09:10

nmaier