I have a malformed page to scrape, and have had a hard time getting the correct XPath for YQL. I can scrape individual fields that I need using, for example:
//*[@id="cell_12345"]
But what I really need to do is return all elements who's ID begins with cell_
. Something like:
//*[@id="cell_"*]
How do I do this?
Also, if anybody can point me to a good XPath reference it would be very helpful.
Thanks!
We generally use an asterisk (*) while writing XPaths. This is called a Wildcard character. //input[@*='demo'] -> Any “input” node who contains an attribute value as “demo” for any attribute.
XPATH allows the use of wildcards to write more robust path expressions where the use of specific path expressions is either impossible or undesirable. matches any element node.
Explanation of XPath Wildcard This type of wildcard matches any given attribute in the document. Appears in the expression as if it is nested. It matches nodes of any type like text, attribute, namespace or comment whatever it is.
Something like
//*[starts-with(@id, 'ceil_')]
should do nicely.
As for an xpath reference, once you know the syntax and the axis, just any old function reference should help. This was the first one google: http://www.w3schools.com/xpath/xpath_functions.asp
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With