First I just created myself a regular expression that will match all unique external library paths in a list of all header files in a project. I asked a question regarding making that regexp a week ago.
I started meddling around to see how it would behave when asynchronous and when turned into a web worker. For convenience and reliability I created this universal file that runs in all three modes:
/** Will call result() callback with every match it founds. Asynchronous unless called * with interval = -1. * Javadoc style comment for Arnold Rimmer and other Java programmers: * * @param regex regular expression to match in string * @param string guess what * @param result callback function that accepts one parameter, string match * @param done callback on finish, has no parameters * @param interval delay (not actual interval) between finding matches. If -1, * function will be blocking * @property working false if loop isn't running, otherwise contains timeout ID * for use with clearTimeout * @property done copy of done parameter * @throws heavy boulders **/ function processRegex(regex, string, result, done, interval) { var m; //Please tell me interpreter optimizes this interval = typeof interval!='number'?1:interval; //And this processRegex.done = done; while ((m = regex.exec(string))) { Array.prototype.splice.call(m,0,1); var path = m.join(""); //It's good to keep in mind that result() slows down the process result(path); if (interval>=0) { processRegex.working = setTimeout(processRegex, interval, regex, string, result, done, interval); // Comment these out for maximum speed processRegex.progress = regex.lastIndex/string.length; console.log("Progress: "+Math.round(processRegex.progress*100)+"%"); return; } } processRegex.working = false; processRegex.done = null; if (typeof done=="function") done(); } processRegex.working = false;
I created a test file, rather than pasting it here I uploaded it on very reliable web hosting: Demo - Test data.
What I find very surprising is that there is such a significant difference between web worker and browser execution of RegExp. The results I got:
[WORKER]: Time elapsed:16.860s
[WORKER-SYNC]: Time elapsed:16.739s
[TIMEOUT]: Time elapsed:5.186s
[LOOP]: Time elapsed:5.028s
You can also see that with my particular regular expression, the difference between a synchronous and an asynchronous loop is insignificant. I tried to use a match list instead of a lookahead expression and the results changed a lot. Here are the changes to the old function:
function processRegexUnique(regex, string, result, done, interval) { var matchList = arguments[5]||[]; ... same as before ... while ((m = regex.exec(string))) { ... same as before ... if (matchList.indexOf(path)==-1) { result(path); matchList.push(path); } if (interval>=0) { processRegex.working = setTimeout(processRegex, interval, regex, string, result, done, interval, matchList); ... same as before ... } } ... same as before ... }
And the results:
[WORKER]: Time elapsed:0.062s
[WORKER-SYNC]: Time elapsed:0.023s
[TIMEOUT]: Time elapsed:12.250s
(note to self: it's getting weirder every minute)[LOOP]: Time elapsed:0.006s
Can anyone explain such a difference in speed?
After a series of tests, I confirmed that this is a Mozilla Firefox issue (it affects all windows desktop versions I tried). With Google Chrome, Opera, or even Firefox mobile, the regexp matches take about the same, worker or not.
If you need this issue fixed, be sure to vote on bug report on bugzilla. I will try to add additional information if anything changes.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With