This article argues that regular expressions cannot match nested structures because regexes are finite automatons. He then offers a list of problems in which the answer states that the following cannot be solved using regexes: <ol> <li>matching an XML element</li> <li>matching a C/VB/C# math expression</li> <li>matching a valid regex</li> </ol> Since 2 & 3 can conceivably contain brackets; this nesting is unsolvable for regexes. But why is it impossible to match an XML element ? (He didn't provide examples).

You can match a limited subset of HTML tags, if you know in advance the tags to be matched. But you can't (reliably or nicely) parse arbitrary HTML. It is not a regular language.

How would you match this valid XML with regex? <pre class="prettyprint"><code><div class='foo' id="bar" inline></div> </code></pre> It's like making a wooden car. Sure you can try to do it, but why? But then comes the part of parsing the XML. How would you extract a set of possibly infinite attributes from an infinite set of elements using a finite set of groups? It's just not possible due to the nature and structure of regex.

Why is it that regex cannot match an XML element?

2 Answers

You can match a limited subset of HTML tags, if you know in advance the tags to be matched.

But you can't (reliably or nicely) parse arbitrary HTML. It is not a regular language.

180

answered Nov 04 '22 19:11

alex

How would you match this valid XML with regex?

Click to copy

<!--<d>>--<<--><div class='foo' id="bar" inline></div>

It's like making a wooden car. Sure you can try to do it, but why?

But then comes the part of parsing the XML. How would you extract a set of possibly infinite attributes from an infinite set of elements using a finite set of groups? It's just not possible due to the nature and structure of regex.

answered Nov 04 '22 18:11

Blender

Related questions
                            
                                XML attributes get sorted
                            
                                XML literals in JavaScript?
                            
                                Does VS2010 have an xml schema validator?
                            
                                XML, S-Expressions, and overlapping scope... What's it called?
                            
                                Validate XML using LibXML
                            
                                What is the difference between XML data and XML metadata?
                            
                                Scala: modify a NodeSeq
                            
                                python reporting line/column of origin of XML node
                            
                                How can I marshall a single java bean into a complex XML document with existing annotations?
                            
                                rails - to_xml placing values in xml attributes not tags
                            
                                What's the clojure equivalent of Nokogiri (for xml parsing with xpath and css selectors)
                            
                                Selecting parent and children values of XML together
                            
                                Simplest way to deserialize an Array/Sequence of objects from XML with C#?
                            
                                Save a DataSet ds.WriteXml(...) without <NewDataSet> Tag?
                            
                                Simplest way to add DOCTYPE to Scala XML?
                            
                                Ultra-portable, small complex config file library in ANSI C?
                            
                                Python: parse an XML in Windows-1251 encoding
                            
                                XSLT Transform not indenting properly
                            
                                Python/ElementTree: Write to file without namespaces
                            
                                Good alternative to FOP?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Why is it that regex cannot match an XML element?

Tags:

language-agnostic

regex

xml

Frankie Ribery

People also ask

2 Answers

alex

Blender

Recent Activity

Donate For Us