I want to write a method for a Java class. The method accepts as input a string of XML data as given below.
<?xml version="1.0" encoding="UTF-8"?>
<library>
<book>
<name> <> Programming in ANSI C <> </name>
<author> <> Balaguruswamy <> </author>
<comment> <> This comment may contain xml entities such as &, < and >. <> </comment>
</book>
<book>
<name> <> A Mathematical Theory of Communication <> </name>
<author> <> Claude E. Shannon <> </author>
<comment> <> This comment also may contain xml entities. <> </comment>
</book>
<!-- This library contains more than ten thousand books. -->
</library>
The XML string contains a lot of substring starting and ending with <>. The substring may contain XML entities such as >, <, &, ' and ". The method need to replace them with >, <, &. ' and " respectively.
Is there any regular-expression method in Java to accomplish this task?
Is this data being passed to you, or can you control it? If so, then I would suggest using a CDATA block. If you are really unsure about the data being entered into the xml blocks, then just wrap everything in a CDATA before it is saved to the DB
If you do not have control over this, then as far as I know, this will take a fair amount of coding due to the number of edge cases you possibly will have to deal with. Not something that a simple regex will be able to deal with (if a valid block is starting, if one is ending, if one has already ended, etc)
Here is a very basic regex for the <> case, but the rest I really believe just get extremely complicated
\<\>* //For <> changes
You can follow in an example
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With