Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Which Regular Expression Algorithm does Javascript use for Regex?

I was reading this article today on two different regular expression algorithms.

According to the article old Unix tools like ed, sed, grep, egrep, awk, and lex, all use what's called the Thompson NFA algorithm in their regular expresssions...

However newer tools like Java, Perl, PHP, and Python all use a different algorithm for their regular expressions that are much, much slower.

This article makes no mention at all of Javascript's regex algorthim, (and yes I know there are various JS engines out there) but I was wondering if anybody knew which of those algorithms they use, and if maybe those algorithms should be swapped out for Thompson NFA.

like image 224
leeand00 Avatar asked Apr 07 '09 20:04

leeand00


People also ask

What algorithm is used with RegEx?

Most library implementations of regular expressions use a backtracking algorithm that can take an exponential amount of time on some inputs.

Does JavaScript support RegEx?

Regular expressions are patterns that provide a powerful way to search and replace in text. In JavaScript, they are available via the RegExp object, as well as being integrated in methods of strings.

Which method is used to evaluate RegEx?

JavaScript regular expression object provides two methods: test() and exec() to evaluate a regular expression.

Is NLP a RegEx?

They form part of the basic techniques in NLP and learning them will make you a more efficient programmer. Therefore, Regular Expression is one of the key concepts of Natural Language Processing that every NLP expert should be proficient in.


1 Answers

The Javascript ECMA language description doesn't impose a requirement for the particular implementation of regular expressions, so that part of the question isn't well-formed. You're really wondering about the particular implementation in a particular browser.

The reason Perl/Python etc use a slower algorithm, though, is that the regex language defined isn't really regular expressions. A real regular expression can be expressed as a finite state machine, but the language of regex is context free. That's why the fashion is to just call it "regex" instead of talking about regular expressions.

Update

Yes, in fact javascript regex isn't content free regular. Consider the syntax using `{n,m}', that is, matches from n to m accepted regexs. Let d the difference d=|n-m|. The syntax means there exists a string uxdw that is acceptable, but a string uxk>dw that is not. It follows via the pumping lemma for regular languages that this is not a regular language.

(augh. Thinko corrected.)

like image 181
Charlie Martin Avatar answered Oct 08 '22 12:10

Charlie Martin