Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why is it that regex cannot match an XML element?

This article argues that regular expressions cannot match nested structures because regexes are finite automatons.

He then offers a list of problems in which the answer states that the following cannot be solved using regexes:

  1. matching an XML element
  2. matching a C/VB/C# math expression
  3. matching a valid regex

Since 2 & 3 can conceivably contain brackets; this nesting is unsolvable for regexes. But why is it impossible to match an XML element ? (He didn't provide examples).

like image 777
Frankie Ribery Avatar asked Jun 07 '11 02:06

Frankie Ribery


People also ask

Can regex parse XML?

XML is not a regular language (that's a technical term) so you will never be able to parse it correctly using a regular expression.

Why you cant parse HTML with regex?

HTML is not a regular language and hence cannot be parsed by regular expressions. Regex queries are not equipped to break down HTML into its meaningful parts.

What is XML regex?

XML schema always implicitly anchors the entire regular expression. The regex must match the whole element for the element to be considered valid. If you have the pattern regexp, the XML schema validator will apply it in the same way as say Perl, Java or . NET would do with the pattern ^regexp$.


2 Answers

You can match a limited subset of HTML tags, if you know in advance the tags to be matched.

But you can't (reliably or nicely) parse arbitrary HTML. It is not a regular language.

like image 180
alex Avatar answered Nov 04 '22 19:11

alex


How would you match this valid XML with regex?

<!--<d>>--<<--><div class='foo' id="bar" inline></div>

It's like making a wooden car. Sure you can try to do it, but why?

But then comes the part of parsing the XML. How would you extract a set of possibly infinite attributes from an infinite set of elements using a finite set of groups? It's just not possible due to the nature and structure of regex.

like image 28
Blender Avatar answered Nov 04 '22 18:11

Blender