Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Jsoup : SelectorParseException when colon in xml tag

Exception is thrown when xml tag has colon,

Exception:

org.jsoup.select.Selector$SelectorParseException: Could not parse query 'w:r': unexpected token at ':r'

XML:

<w:r>
 <w:rPr>
   <w:rStyle w:val="jid"/>
 </w:rPr>
 <w:t>AN</w:t>
</w:r>

Java code:

    org.jsoup.nodes.Document doc = Jsoup.parse(documentXmlString);

Here documentXmlString has the xml specified above

like image 488
Linda Avatar asked Nov 26 '12 06:11

Linda


3 Answers

Just replace ":" with "|"

doc.select("w|r");

I'm using Jsoup 1.5.2.

like image 92
Allen Chan Avatar answered Nov 11 '22 16:11

Allen Chan


Though your patchwork has worked for you.. I would like to give knowledge on namespace !

the w: in your XML is actually called namespace prefix. And to use neamespace prefix it has to be declared in the root node! 1+ Since the declaration part was missing in your source XML! parser was throwing an error! Below is the way to define namespace in XML! I have corrected your own XML, I bet it wouldn't error-out now!

<w:r xmlns:w="http://www.w3.org/SomeNamespace">
  <w:rPr>
    <w:rStyle w:val="jid"/>
  </w:rPr>
  <w:t>AN</w:t>
</w:r>

Additional information:

The namespace has its own scope! in the below example:

<root>
    <w:r xmlns:w="http://www.w3.org/SomeNamespace">
      <w:rPr>
        <w:rStyle w:val="jid"/>
      </w:rPr>
      <w:t>AN</w:t>
    </w:r>
    <someotherElement>
      <dummychild/>
    </someotherElement>

In the above example, you cannot use namespace prefix on <someotherElement> or <dummychild/>!! because the scope of namespace prefix w is upto element <r> and its child (grandchild) only!


1+:The Element under which Namespace is declared.. the namespace will be valid for itself and its child nodes.. Declaring namespace under root makes namespace valid/available for all the elements in XML Document.

like image 2
InfantPro'Aravind' Avatar answered Nov 11 '22 17:11

InfantPro'Aravind'


I used,

 documentXmlString = documentXmlString.replaceAll("w:","w");
like image 1
Linda Avatar answered Nov 11 '22 17:11

Linda