Hey, I'd like to scrape some data from my blog using YQL:
SELECT * FROM html WHERE url="http://site.com/blog" AND xpath="//div[@class='post']"
How can I use different bits of xpath in my query? E.g. can I do something like:
SELECT * FROM html WHERE url="http://site.com/blog" AND xpath="//div[@class='post']" AND xpath ="//div[@class='title']"
assuming I want to get the post and the title? I guess I could take in all the HTML but I'd rather only take what I need as speed is an issue here.
Once I have the HTML I want to extract the text from the markup, is it OK to use PHP regular expressions for this?
I also understand you can use CSS syntax, if you have experience using this with YQL and could guide me in how I could write a similar query to the one above but in CSS rather than XPATH I'd be grateful!
Thanks.
Regarding CSS:
See the YQL website itself for this. Search google for YQL and CSS (I can only post one link in here and the 2nd one is more useful.)
The example they have there is actually no longer working but you can try out this example, which scrapes the questions from the frontpage of stackoverflow.
YQL example
Multiple Selects with one XPATH:
You CAN do this directly with xpath syntax. e.g.
SELECT * FROM html WHERE url="www.asscompact.de" and xpath="//head/meta[@name='title']|//head/meta[@name='description']|//head/meta[@name='keywords']"
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With