I need to use the XPath function normalized-space() to normalize the text I want to extract from a XHTML document: http://test.anahnarciso.com/clean_bigbook_0.html
I'm using the following expression:
//*[@slot="address"]/normalize-space(.)
Which works perfectly in Qizx Studio, the tool I use to test XPath expressions.
let $doc := doc('http://test.anahnarciso.com/clean_bigbook_0.html')
return $doc//*[@slot="address"]/normalize-space(.)
This simple query returns a sequence of xs:string
.
144 Hempstead Tpke
403 West St
880 Old Country Rd
8412 164th St
8412 164th St
1 Irving Pl
1622 McDonald Ave
255 Conklin Ave
22011 Hempstead Ave
7909 Queens Blvd
11820 Queens Blvd
1027 Atlantic Ave
1068 Utica Ave
1002 Clintonville St
1002 Clintonville St
1156 Hempstead Tpke
Route 49
10007 Rockaway Blvd
12694 Willets Point Blvd
343 James St
Now, I want to use the previous expression in my Java code.
String exp = "//*[@slot=\"address"\"]/normalize-space(.)";
XPath xpath = XPathFactory.newInstance().newXPath();
XPathExpression expr = xpath.compile(exp);
Object result = expr.evaluate(doc, XPathConstants.NODESET);
But the last line throws an Exception:
Cannot convert XPath value to Java object: required class is org.w3c.dom.NodeList; supplied value has type xs:string
Obvsiously, I should change XPathConstants.NODESET
for something; I tried XPathConstants.STRING
but it only returns the first element of the sequence.
How can I obtain something like an array of Strings?
Thanks in advance.
Your expression works in XPath 2.0, but is illegal in XPath 1.0 (which is used in Java) - it should be normalize-space(//*[@slot='address'])
.
Anyway, in XPath 1.0, when normalize-space()
is called on a node-set, only the first node (in document order) is taken.
In order to do what you want to do, you'll need to use a XPath 2.0 compatible parser, or traverse the resulting node-set and call normalize-space()
on every node:
XPath xpath = XPathFactory.newInstance().newXPath();
XPathExpression expr;
String select = "//*[@slot='address']";
expr = xpath.compile(select);
NodeList result = (NodeList)expr.evaluate(input, XPathConstants.NODESET);
String normalize = "normalize-space(.)";
expr = xpath.compile(normalize);
int length = result.getLength();
for (int i = 0; i < length; i++) {
System.out.println(expr.evaluate(result.item(i), XPathConstants.STRING));
}
...outputs exactly your given output.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With