Does .NET really use NFA for regular expression engine?

Tags:

Article Details of Regular Expression Behavior from MSDN says, that .NET devs decide to use for regular expressions traditional NFA engine, because it is faster than POSIX NFA, but it is not clear to me, why does this pattern works exponentially slow then?

var regex = new Regex("(a|aa)*b");
var b = regex.IsMatch("aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaac");

This simple pattern matching take more than 30 minute to execute. But if .NET uses traditional NFA, it is possible to simulate it and find match in O(M*N) time in the worst case, where M is pattern length and N is text length, which surely is not true in this case.

Article also explains that backtracking is the reason of slow execution, but I still have some questions that can't find answers

What is backtracking? is it only using already matched pattern again like this (a|b)c/1?
Does traditional NFA support backtracking, if no what modification does it need?
If NFA supports it, but need more slower algorithm to simulate, why .NET can't check if backtracking exist in the pattern, and use one algorithm and use another if it doesn't?

534

asked Dec 12 '13 09:12

Arsen Mkrtchyan

1 Answers

You can compile a regular expression to a NFA or a DFA, although the DFA calculated from an NFA may be impractically large. You can match a NFA with or without backtracking, although the schemes that work without back-tracking usually put more constraints on the regular expression language, and on which matches are found when there are many possible matches.

Your example is slow because the matcher has to decide very often whether to match with a or aa, and whether to try matching the final b. Backtracking works like running a recursive function which, at each step, makes recursive calls on itself for each possibility - recursively match with a and if that fails recursively match with aa and if that fails recursively match with b.

Microsoft seem to be saying that their sort of backtracking is faster than POSIX because POSIX backtracking will arrange for a recursive search that carries on until it is sure that it has found the longest possible match. The Microsoft version still backtracks, but it does not have extra checks that carry on until there is a guarantee that they have found the longest possible match. There is an example in http://msdn.microsoft.com/en-us/library/dsy130b4%28v=vs.110%29.aspx.

Regular expression matchers without backtracking can work by accepting input one character at a time, and keeping track of which states in the NFA are live at that time - there may be many such states. It is hard to make this work with back-references, because then the state of the NFA cannot be described by just saying whether a state is live or not.

139

answered Oct 11 '22 20:10

mcdowella

Related questions
                            
                                WPF Style for base window not applied in App.xaml, but is in Themes/Generic.xaml
                            
                                Visual Studio 2010 hangs on trace point
                            
                                Sql query containing 2 databases
                            
                                Adding validation information to REST responses
                            
                                Top level exception not catching anything
                            
                                How to find MethodInfo for a method of a generic class using strongly-typed reflection?
                            
                                What am I missing? RestSharp won't deserialize Json
                            
                                MEF Runtime Plugin Update Issue
                            
                                Interesting Lucene.net Exception
                            
                                How to use async with Visual Studio 2010 and .NET 4.0?
                            
                                What is the difference between Policy15 and Policy12?
                            
                                Fastest way to update (populate) 1,000,000 records into a database using .NET
                            
                                Thread.Sleep(0) doesn't work as described?
                            
                                how to save a model that has existing data as well as new data?
                            
                                Memory consumption of BitmapImage/Image control in Windows Phone 8
                            
                                How to use SQL Server OFFSET & FETCH FIRST with Entity Framework 5?
                            
                                Using the same template for edit and insert in a list view
                            
                                Using 2013 msbuild in program with Project.Build
                            
                                ASP.NET Web API OData - Translating DTO queries into Entity queries
                            
                                OrderBy with a non-transitive IComparer

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Does .NET really use NFA for regular expression engine?

Tags:

c#

.net

regex

algorithm

Arsen Mkrtchyan

People also ask

1 Answers

mcdowella

Recent Activity

Donate For Us