Here is the input data: <pre class="prettyprint"><code> *** INVOICE *** THE BIKE SHOP 1 NEW ROAD, TOWNVILLE, SOMEWHERE, UK, AB1 2CD TEL 01234-567890 To: COUNTER SALE No: 243529 Page: 1 Date: 04/06/10 12:00 Ref: Aiden Cust No: 010000 </code></pre> Here is a regex that works (Options: singleline, ignorewhitespace, compiled) - it matches immediately and the groups are properly populated: <pre class="prettyprint"><code>\W+INVOICE\W+ (?<shopAddr>.*?)\W+ To:\W+(?<custAddr>.*?)\W+ No:\W+(?<invNo>\d+).*? Date:\W+(?<invDate>[0-9/ :]+)\W+ Ref:\W+(?<ref>[\w ]*?)\W+ Cust </code></pre> As soon as I add the 'N' out of Cust No into the rex, parsing the input hangs forever: <pre class="prettyprint"><code>\W+INVOICE\W+ (?<shopAddr>.*?)\W+ To:\W+(?<custAddr>.*?)\W+ No:\W+(?<invNo>\d+).*? Date:\W+(?<invDate>[0-9/ :]+)\W+ Ref:\W+(?<ref>[\w ]*?)\W+ Cust N </code></pre> If I add something like "any character" : <pre class="prettyprint"><code>\W+INVOICE\W+ (?<shopAddr>.*?)\W+ To:\W+(?<custAddr>.*?)\W+ No:\W+(?<invNo>\d+).*? Date:\W+(?<invDate>[0-9/ :]+)\W+ Ref:\W+(?<ref>[\w ]*?)\W+ Cust . </code></pre> It works, but as soon as I add a fixed character, the rex hangs again: <pre class="prettyprint"><code>\W+INVOICE\W+ (?<shopAddr>.*?)\W+ To:\W+(?<custAddr>.*?)\W+ No:\W+(?<invNo>\d+).*? Date:\W+(?<invDate>[0-9/ :]+)\W+ Ref:\W+(?<ref>[\w ]*?)\W+ Cust ..: </code></pre> Can anyone advise why adding something so trivial would cause it to fall over? Can I enable some kind of tracing to watch the matching activity to see if it is getting stuck in a catastrophic backtrack?

With <code>RegexOptions.IgnorePatternWhitespace</code>, you're telling the engine to ignore whitespaces in your pattern. Thus, when you write <code>Cust No</code> in the pattern, it really means <code>CustNo</code>, which doesn't match the input. This is the cause of the problem. From the documentation: <blockquote> By default, white space in a regular expression pattern is significant; it forces the regular expression engine to match a white-space character in the input string. [...] The <code>RegexOptions.IgnorePatternWhitespace</code> option, or the <code>x</code> inline option, changes this default behavior as follows: <ul> <li> Unescaped white space in the regular expression pattern is ignored. To be part of a regular expression pattern, white-space characters must be escaped (e.g. as <code>\s</code> or <code>"\ "</code>).</li> </ul> </blockquote> So instead of <code>Cust No</code>, in <code>IgnorePatternWhitespace</code> mode, you must write <code>Cust\ No</code>, because otherwise it's interpreted as <code>CustNo</code>.

Adding a single character to my .NET RegEx causes it to hang

Tags:

.net

regex

freeze

Here is the input data:

                                *** INVOICE ***                                

                              THE BIKE SHOP                              
                      1 NEW ROAD, TOWNVILLE,                       
                          SOMEWHERE, UK, AB1 2CD                          
                        TEL 01234-567890  

 To: COUNTER SALE                                   No:  243529 Page: 1

                                                    Date: 04/06/10 12:00

                                                    Ref:    Aiden   

 Cust No: 010000

Here is a regex that works (Options: singleline, ignorewhitespace, compiled) - it matches immediately and the groups are properly populated:

\W+INVOICE\W+
(?<shopAddr>.*?)\W+
To:\W+(?<custAddr>.*?)\W+
No:\W+(?<invNo>\d+).*?
Date:\W+(?<invDate>[0-9/ :]+)\W+
Ref:\W+(?<ref>[\w ]*?)\W+
Cust

As soon as I add the 'N' out of Cust No into the rex, parsing the input hangs forever:

\W+INVOICE\W+
(?<shopAddr>.*?)\W+
To:\W+(?<custAddr>.*?)\W+
No:\W+(?<invNo>\d+).*?
Date:\W+(?<invDate>[0-9/ :]+)\W+
Ref:\W+(?<ref>[\w ]*?)\W+
Cust N

If I add something like "any character" :

\W+INVOICE\W+
(?<shopAddr>.*?)\W+
To:\W+(?<custAddr>.*?)\W+
No:\W+(?<invNo>\d+).*?
Date:\W+(?<invDate>[0-9/ :]+)\W+
Ref:\W+(?<ref>[\w ]*?)\W+
Cust .

It works, but as soon as I add a fixed character, the rex hangs again:

\W+INVOICE\W+
(?<shopAddr>.*?)\W+
To:\W+(?<custAddr>.*?)\W+
No:\W+(?<invNo>\d+).*?
Date:\W+(?<invDate>[0-9/ :]+)\W+
Ref:\W+(?<ref>[\w ]*?)\W+
Cust ..:

Can anyone advise why adding something so trivial would cause it to fall over? Can I enable some kind of tracing to watch the matching activity to see if it is getting stuck in a catastrophic backtrack?

340

asked Jun 04 '10 13:06

Matt

1 Answers

With RegexOptions.IgnorePatternWhitespace, you're telling the engine to ignore whitespaces in your pattern. Thus, when you write Cust No in the pattern, it really means CustNo, which doesn't match the input. This is the cause of the problem.

From the documentation:

By default, white space in a regular expression pattern is significant; it forces the regular expression engine to match a white-space character in the input string. [...]

The RegexOptions.IgnorePatternWhitespace option, or the x inline option, changes this default behavior as follows:

Unescaped white space in the regular expression pattern is ignored. To be part of a regular expression pattern, white-space characters must be escaped (e.g. as \s or "\ ").

So instead of Cust No, in IgnorePatternWhitespace mode, you must write Cust\ No, because otherwise it's interpreted as CustNo.

166

answered Sep 27 '22 01:09

polygenelubricants

Related questions
                            
                                StyleCop Suppression
                            
                                Checking for .NET dependencies before launching
                            
                                .Net TimeZoneInfo ID - Is it Windows Language Specific?
                            
                                Is .NET Compact a perfect subset of .NET?
                            
                                Json.NET - How to serialize a class using custom resolver
                            
                                Quantity of specific strings inside a string
                            
                                In house reusable libraries - reuse as dll or as projects?
                            
                                .NET Framework 4 RTM on Windows server 2008 R2
                            
                                lambda operator c# learning [closed]
                            
                                how to tell if a photo was taken in landscape or portrait? JPEG .NET metadata orientation
                            
                                Good tools which generate NUnit unit tests for .NET assemblies in Visual Studio 2008 [duplicate]
                            
                                Is there any performance advantage to using DirectorySearcher over SearchRequest for LDAP queries
                            
                                Aldon and .Net Development
                            
                                How to fine tune FluentNHibernate's auto mapper?
                            
                                .NET: Strange behaviour of double.Equals() when boxing
                            
                                Unit Testing User Interface. What is an effective way?
                            
                                Literal ampersands in System.Uri query string
                            
                                How to serialize this Xml in .NET (array)
                            
                                working with xbrl in .net
                            
                                Integration of C#, F#, IronPython and IronRuby

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With