Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

String replace in substring

Tags:

java

regex

xml

I want to write a method for a Java class. The method accepts as input a string of XML data as given below.

<?xml version="1.0" encoding="UTF-8"?>
<library>

    <book>
        <name> <> Programming in ANSI C <> </name>
        <author> <>  Balaguruswamy <> </author>
        <comment> <> This comment may contain xml entities such as &, < and >. <> </comment>
    </book>

    <book>
        <name> <> A Mathematical Theory of Communication <> </name>
        <author> <> Claude E. Shannon <> </author>
        <comment> <> This comment also may contain xml entities. <> </comment>
    </book>

    <!-- This library contains more than ten thousand books. -->
</library>

The XML string contains a lot of substring starting and ending with <>. The substring may contain XML entities such as >, <, &, ' and ". The method need to replace them with &gt;, &lt;, &amp;. &apos; and &quot; respectively.

Is there any regular-expression method in Java to accomplish this task?

like image 582
Mohammed H Avatar asked Mar 18 '12 03:03

Mohammed H


2 Answers

Is this data being passed to you, or can you control it? If so, then I would suggest using a CDATA block. If you are really unsure about the data being entered into the xml blocks, then just wrap everything in a CDATA before it is saved to the DB

If you do not have control over this, then as far as I know, this will take a fair amount of coding due to the number of edge cases you possibly will have to deal with. Not something that a simple regex will be able to deal with (if a valid block is starting, if one is ending, if one has already ended, etc)

Here is a very basic regex for the <> case, but the rest I really believe just get extremely complicated

\<\>* //For <> changes
like image 143
Justin Pihony Avatar answered Sep 18 '22 22:09

Justin Pihony


You can follow in an example

  1. Read a XML file by Dom or SAX
  2. Replace string by Regular expression
  3. Write a XML file by Dom or SAX
like image 39
punny Avatar answered Sep 19 '22 22:09

punny