Why does this regular expression kill the Java regex engine?

Tags:

I have this naive regex "<([\s]|[^<])+?>" (excluding the quotation marks). It seems so straightforward but it is indeed evil when it works against the below HTML text. It sends the Java regular expression engine to an infinite loop.

I have another regex ("<.+?>"), which does somewhat the same thing, but it doesn't kill anything. Do you know why this happens?

<script language="JavaScript" type="text/javascript">
        var numDivs, layerName;
        layerName = "lnavLayer";
        catLinkName = "category";
        numDivs = 2;
        function toggleLayer(layerID){
            if (!(navigator.appName == "Netscape" && navigator.appVersion.substr(0, 1) < 5)){
                thisLayer = document.getElementById(layerName + layerID);
                categoryLink = document.getElementById(catLinkName + layerID);
                closeThem();
                if (thisLayer.className == 'subnavDefault'){
                    thisLayer.className = 'subnavToggled';
                    categoryLink.className = 'leftnavLinkSelectedSection';
                }
            }
        }
        function closeThem(){
            for(x = 0; x < numDivs; x++){
                theLayer = document.getElementById(layerName + (x
+ 1));
                thecategoryLink = document.getElementById(catLinkName + (x + 1));
                theLayer.className = 'subnavDefault';
                thecategoryLink.className = 'leftnavLink';
            }
        } var flag = 0; var lastClicked = 0
    //-->
    </script>

it even keeps looping with an online Java regex tool (such as www.fileformat.info/tool/regex.htm) or a utility like RegexBuddy.

639

asked Nov 13 '08 23:11

Martin08

1 Answers

The reason the Java regex engine crashes is that this part of your regex causes a stack overflow (indeed!):

[\s]|[^<]

What happens here is that every character matched by \s can also be matched by [^<]. That means there are two ways to match each whitespace character. If we represent the two character classes with A and B:

A|B

Then a string of three spaces could be matched as AAA, AAB, ABA, ABB, BAA, BAB, BBA, or BBB. In other words the complexity of this part of the regex is 2^N. This will kill any regex engine that doesn't have any safeguards against what I call catastrophic backtracking.

When using alternation (vertical bar) in a regex, always make sure the alternatives are mutually exclusive. That is, at most one of the alternatives may be allowed to match any given bit of text.

answered Oct 02 '22 13:10

Jan Goyvaerts

Related questions
                            
                                jmeter test failed with out of memory error
                            
                                Java SE + Spring Data + Hibernate
                            
                                HTTP Status 500 - Provider org.glassfish.json.JsonProviderImpl not found
                            
                                set mac jdk version to 1.8
                            
                                Can not start elasticsearch as a service in ubuntu 16.04
                            
                                Removing the Apache TomCat runtime from a project in Eclipse?
                            
                                Getting only email address to display when using message.getFrom() in JavaMail
                            
                                Does new always allocate on the heap in C++ / C# / Java
                            
                                JSF: conditionally render a list item (<li>)
                            
                                How to construct a Non Instantiable AND Non Inheritable Class in Java
                            
                                runOnUiThread Undefined for Class
                            
                                Optional @PropertySource location
                            
                                How can I tell if a jar was compiled on a 64bit or 32bit system?
                            
                                Why can't we instantiate an abstract class in Java?
                            
                                Iterate through 2 dimensional array
                            
                                Making an EditText field accept only letters and white spaces in Android
                            
                                Can a java method return value depending upon condition?
                            
                                How to deploy war file in root(/) context to Wildfly ver 9.0.1
                            
                                Java 8 convert String of ints to List<Integer>
                            
                                Flyway: Cannot find migrations location in: [classpath:db/migration]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Why does this regular expression kill the Java regex engine?

Tags:

java

regex

Martin08

People also ask

1 Answers

Jan Goyvaerts

Recent Activity

Donate For Us