Hi I have an example CDATA here <pre class="prettyprint"><code><![CDATA[asd[f]]]> </code></pre> and <pre class="prettyprint"><code><tag1><![CDATA[asd[f]]]></tag1><tag2><![CDATA[asd[f]]]></tag2> </code></pre> The CDATA regex i have is not able to recognize this <pre class="prettyprint"><code>"<![CDATA["([^\]]|"]"[^\]]|"]]"[^>])*"]]>" </code></pre> this does not work too <pre class="prettyprint"><code>"<![CDATA["[^\]]*[\]]{2,}([^\]>][^\]]*[\]]{2,})*">" </code></pre> Will someone please give me a regex for <code><![CDATA[asd[f]]]></code>, I need to use it in Lex/Flex : I have answered this question, please vote on my answer, thanks.

Easy enough, it should be this: <pre class="prettyprint"><code><!\[CDATA\[.*?\]\]> </code></pre> At least it works on regexpal.com

The problem is that this is rather awkward to match with the sort of regular expressions used in <code>lex</code>; if you had a system that supported EREs, then you'd be able to do either: <pre class="prettyprint"><code><!\[CDATA\[(.*?)\]\]> </code></pre> or <pre class="prettyprint"><code><!\[CDATA\[((?:[^]]|\](?!\]>))*)\]\]> </code></pre> (The first uses non-greedy quantifiers, the second uses negative lookahead constraints. OK, it uses non-capturing parens too, but you can use capturing ones there instead; that's not so important.) It's probably easier to handle this by using a similar strategy to the way C-style comments are handled in <code>lex</code>, by having one rule to match the start of the CDATA (on <code><![CDATA[</code>) and put the lexer into a separate state that it leaves on seeing <code>]]></code>, while collecting all the characters in-between. This is instructive on the topic (and it seems that this is an area where <code>flex</code> and <code>lex</code> differ) and it covers all the strategies that you can take to make this work. Note that cause of all these problems are because it's very difficult to write a rule with simple regular expressions that expresses the fact that a greedy regular expression must only match a <code>]</code> if it is not followed by <code>]></code>. It's much easier to do if you've only got a two-character (or single character!) end-of-interesting-section sequence because you don't need such an elaborate state machine.

What is the regex expression for CDATA

Tags:

regex

parsing

xml

cdata

lex

Hi I have an example CDATA here

<![CDATA[asd[f]]]>

and

<tag1><![CDATA[asd[f]]]></tag1><tag2><![CDATA[asd[f]]]></tag2>

The CDATA regex i have is not able to recognize this

"<![CDATA["([^\]]|"]"[^\]]|"]]"[^>])*"]]>"

this does not work too

"<![CDATA["[^\]]*[\]]{2,}([^\]>][^\]]*[\]]{2,})*">"

Will someone please give me a regex for <![CDATA[asd[f]]]>, I need to use it in Lex/Flex

: I have answered this question, please vote on my answer, thanks.

662

asked Jan 06 '11 15:01

Freddy Chua

2 Answers

Easy enough, it should be this:

<!\[CDATA\[.*?\]\]>

At least it works on regexpal.com

129

answered Sep 28 '22 08:09

Sean Patrick Floyd

The problem is that this is rather awkward to match with the sort of regular expressions used in lex; if you had a system that supported EREs, then you'd be able to do either:

<!\[CDATA\[(.*?)\]\]>

<!\[CDATA\[((?:[^]]|\](?!\]>))*)\]\]>

(The first uses non-greedy quantifiers, the second uses negative lookahead constraints. OK, it uses non-capturing parens too, but you can use capturing ones there instead; that's not so important.)

It's probably easier to handle this by using a similar strategy to the way C-style comments are handled in lex, by having one rule to match the start of the CDATA (on <![CDATA[) and put the lexer into a separate state that it leaves on seeing ]]>, while collecting all the characters in-between. This is instructive on the topic (and it seems that this is an area where flex and lex differ) and it covers all the strategies that you can take to make this work.

Note that cause of all these problems are because it's very difficult to write a rule with simple regular expressions that expresses the fact that a greedy regular expression must only match a ] if it is not followed by ]>. It's much easier to do if you've only got a two-character (or single character!) end-of-interesting-section sequence because you don't need such an elaborate state machine.

answered Sep 28 '22 08:09

Donal Fellows

Related questions
                            
                                How to add an Android fragment to an activity?
                            
                                How to add new element below existing element using xml Document
                            
                                how to colour the outline of my text black in xml for android
                            
                                XDocument descendants
                            
                                How to use multiple configuration files for log4j2
                            
                                How to output ampersand (&) from XSLT
                            
                                Change ToolBar default icon on the left
                            
                                Obtaining the XML encoding from an XML declaration fragment: XmlDeclaration is not supported for partial content parsing
                            
                                Capture fingerprint from smartphone and save to a file
                            
                                SQL Server - defining an XML type column with UTF-8 encoding
                            
                                Converting a dict to XML with attributes
                            
                                How to put text on right centre of image view on constraint layout
                            
                                Some conflicts were found in the installation area, while updating android studio 3.0.1
                            
                                Scala XML serialization
                            
                                Parse an XML string in MySQL
                            
                                How to build a database from an XSD schema and import XML data
                            
                                How to Search and Navigate XML Nodes
                            
                                In xpath why can I use greater-than symbol > but not less-than <
                            
                                How to get all the info in XML into dictionary with Python
                            
                                Working with very huge XML file in C#

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With