Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Replace an XML element's value? Sed regular expression?

Tags:

regex

xml

sed

I want to take an XML file and replace an element's value. For example if my XML file looks like this:

<abc>
    <xyz>original</xyz>
</abc>

I want to replace the xyz element's original value, whatever it may be, with another string so that the resulting file looks like this:

<abc>
    <xyz>replacement</xyz>
</abc>

How would you do this? I know I could write a Java program to do this but I assume that that's overkill for replacing a single element's value and that this could be easily done using sed to do a substitution using a regular expression. However I'm less than novice with that command and I'm hoping some kind soul reading this will be able to spoon feed me the correct regular expression for the job.

One idea is to do something like this:

sed s/\<xyz\>.*\<\\xyz\>/\<xyz\>replacement\<\\xyz\>/ <original.xml >new.xml

Maybe it's better for me to just replace the entire line of the file with what I want it to be, since I will know the element name and the new value I want to use? But this assumes that the element in question is on a single line and that no other XML data is on the same line. I'd rather have a command which will basically replace element xyz's value with a new string that I specify and not have to worry if the element is all on one line or not, etc.

If sed is not the best tool for this job then please dial me in to a better approach.

If anyone can steer me in the right direction I'll really appreciate it, you'll probably save me hours of trial and error. Thanks in advance!

--James

like image 761
James Adams Avatar asked Aug 28 '09 16:08

James Adams


2 Answers

sed is not going to be a easy tool to use for multi-line replacements. It's possible to implement them using its N command and some recursion, checking after reading in each line if the close of the tag has been found... but it's not pretty and you'll never remember it.

Of course, actually parsing the xml and replacing tags is going to be the safest thing, but if you know you won't run into any problems, you could try this:

perl -p -0777 -e 's@<xyz>.*?</xyz>@<xyz>new-value</xyz>@sg' <xml-file>

Breaking this down:

  • -p tells it to loop through the input and print
  • -0777 tells it to use the end of file as the input separator, so that it gets the whole thing in in one slurp
  • -e means here comes the stuff I want you to do

And the substitution itself:

  • use @ as a delimiter so you don't have to escape /
  • use *?, the non-greedy version, to match as little as possible, so we don't go all the way to the last occurrence of </xyz> in the file
  • use the s modifier to let . match newlines (to get the multiple-line tag values)
  • use the g modifier to match the pattern multiple times

Tada! This prints the result to stdout - once you verify it does what you want, add the -i option to tell it to edit the file in place.

like image 110
Cascabel Avatar answered Nov 15 '22 09:11

Cascabel


OK so I bit the bullet and took the time to write a Java program which does what I want. Below is the operative method called by my main() method which does the work, in case this will be helpful to someone else in the future:

/**
 * Takes an input XML file, replaces the text value of the node specified by an XPath parameter, and writes a new
 * XML file with the updated data.
 * 
 * @param inputXmlFilePathName
 * @param outputXmlFilePathName
 * @param elementXpath
 * @param elementValue
 * @param replaceAllFoundElements
 */
public static void replaceElementValue(final String inputXmlFilePathName,
                                       final String outputXmlFilePathName,
                                       final String elementXpathExpression,
                                       final String elementValue,
                                       final boolean replaceAllFoundElements)
{
    try
    {
        // get the template XML as a W3C Document Object Model which we can later write back as a file
        InputSource inputSource = new InputSource(new FileInputStream(inputXmlFilePathName));
        DocumentBuilderFactory documentBuilderFactory = DocumentBuilderFactory.newInstance();
        Document document = documentBuilderFactory.newDocumentBuilder().parse(inputSource);

        // create an XPath expression to access the element's node
        XPathFactory xpathFactory = XPathFactory.newInstance();
        XPath xpath = xpathFactory.newXPath();
        XPathExpression xpathExpression = xpath.compile(elementXpathExpression);

        // get the node(s) which corresponds to the XPath expression and replace the value
        Object xpathExpressionResult = xpathExpression.evaluate(document, XPathConstants.NODESET);
        if (xpathExpressionResult == null)
        {
            throw new RuntimeException("Failed to find a node corresponding to the provided XPath.");
        }
        NodeList nodeList = (NodeList) xpathExpressionResult;
        if ((nodeList.getLength() > 1) && !replaceAllFoundElements)
        {
            throw new RuntimeException("Found multiple nodes corresponding to the provided XPath and multiple replacements not specified.");
        }
        for (int i = 0; i < nodeList.getLength(); i++)
        {
            nodeList.item(i).setTextContent(elementValue);
        }

        // prepare the DOM document for writing
        Source source = new DOMSource(document);

        // prepare the output file
        File file = new File(outputXmlFilePathName);
        Result result = new StreamResult(file);

        // write the DOM document to the file
        Transformer transformer = TransformerFactory.newInstance().newTransformer();
        transformer.transform(source, result);
    }
    catch (Exception ex)
    {
        throw new RuntimeException("Failed to replace the element value.", ex);
    }
}

I run the program like so:

$ java -cp xmlutility.jar com.abc.util.XmlUtility input.xml output.xml '//name/text()' JAMES
like image 36
James Adams Avatar answered Nov 15 '22 08:11

James Adams