YQL Console Link
Query:
select * from html where url='http://www.cbs.com/shows/big_brother/video/' and xpath='//div[@id="cbs-video-metadata-wrapper"]/div[@class="cbs-video-share"]/a'
Returns:
<?xml version="1.0" encoding="UTF-8"?>
<query xmlns:yahoo="http://www.yahooapis.com/v1/base.rng"
yahoo:count="1" yahoo:created="2011-07-09T23:14:02Z" yahoo:lang="en-US">
<diagnostics>
<publiclyCallable>true</publiclyCallable>
<url execution-time="146" proxy="DEFAULT"><![CDATA[http://www.cbs.com/shows/big_brother/video/]]></url>
<user-time>163</user-time>
<service-time>146</service-time>
<build-version>19262</build-version>
</diagnostics>
<results>
<a class="twitter-share-button" href="http://twitter.com/share"/>
</results>
</query>
Should Return Something Similar To:
<results>
<a href="http://twitter.com/share" data-url="http://www.cbs.com/shows/big_brother/video/2045825951/big-brother-episode-1" class="twitter-share-button"></a>
</results>
If I back out the query one level, it totally strips out the element, which I could also use to get the data I need.
We can get all attributes of an element with JavaScript by using the node. attributes property. to get all the attributes of the div into an object.
Elements in HTML have attributes; these are additional values that configure the elements or adjust their behavior in various ways to meet the criteria the users want.
We have a new html parser that recognizes custom attributes now.
Add compat="html5"
to trigger the new parser.
e.g.:
select * from html where url = "http://mydomain.com" and compat="html5"
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With