Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Clean namespace handling with dom4j

We are using dom4j 1.6.1, to parse XML comming from somewhere. Sometime, the balise have mention of the namespace ( eg : ) and sometime not ( ). And it's make call of Element.selectSingleNode(String s ) fails.

For now we have 3 solutions, and we are not happy with them

1 - Remove all namespace occurence before doing anything with the xml document

xml = xml .replaceAll("xmlns=\"[^\"]*\"","");
xml = xml .replaceAll("ds:","");
xml = xml .replaceAll("etm:","");
[...] // and so on for each kind of namespace

2 - Remove namespace just before getting a node By calling

Element.remove(Namespace ns)

But it's works only for a node and the first level of child

3 - Clutter the code by

node = rootElement.selectSingleNode(NameWithoutNameSpace)
if ( node == null )
    node = rootElement.selectSingleNode(NameWithNameSpace)

So ... what do you think ? Witch one is the less worse ? Have you other solution to propose ?

like image 473
Antoine Claval Avatar asked Sep 14 '09 15:09

Antoine Claval


1 Answers

I wanted to remove any namespace information(declaration and tag) to ease the xpath evaluation. I end up with this solution :

String xml = ...
SAXReader reader = new SAXReader();
Document document = reader.read(new ByteArrayInputStream(xml.getBytes()));
document.accept(new NameSpaceCleaner());
return document.asXML();

where the NameSpaceCleaner is a dom4j visitor :

private static final class NameSpaceCleaner extends VisitorSupport {
    public void visit(Document document) {
        ((DefaultElement) document.getRootElement())
                .setNamespace(Namespace.NO_NAMESPACE);
        document.getRootElement().additionalNamespaces().clear();
    }
    public void visit(Namespace namespace) {
        namespace.detach();
    }
    public void visit(Attribute node) {
       if (node.toString().contains("xmlns")
        || node.toString().contains("xsi:")) {
        node.detach();
      }
    }

    public void visit(Element node) {
        if (node instanceof DefaultElement) {
        ((DefaultElement) node).setNamespace(Namespace.NO_NAMESPACE);
        }
         }
 }
like image 141
mestachs Avatar answered Oct 06 '22 22:10

mestachs