Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to use multiple xpath selectors in a YQL query

Tags:

php

xpath

yql

Hey, I'd like to scrape some data from my blog using YQL:

SELECT * FROM html WHERE url="http://site.com/blog" AND xpath="//div[@class='post']"

How can I use different bits of xpath in my query? E.g. can I do something like:

SELECT * FROM html WHERE url="http://site.com/blog" AND xpath="//div[@class='post']" AND xpath ="//div[@class='title']"

assuming I want to get the post and the title? I guess I could take in all the HTML but I'd rather only take what I need as speed is an issue here.

Once I have the HTML I want to extract the text from the markup, is it OK to use PHP regular expressions for this?

I also understand you can use CSS syntax, if you have experience using this with YQL and could guide me in how I could write a similar query to the one above but in CSS rather than XPATH I'd be grateful!

Thanks.

like image 846
Umar Hansa Avatar asked Oct 13 '10 15:10

Umar Hansa


1 Answers

Regarding CSS:

See the YQL website itself for this. Search google for YQL and CSS (I can only post one link in here and the 2nd one is more useful.)

The example they have there is actually no longer working but you can try out this example, which scrapes the questions from the frontpage of stackoverflow.

YQL example

Multiple Selects with one XPATH:

You CAN do this directly with xpath syntax. e.g.

SELECT * FROM html WHERE url="www.asscompact.de" and xpath="//head/meta[@name='title']|//head/meta[@name='description']|//head/meta[@name='keywords']"
like image 114
spier Avatar answered Sep 21 '22 14:09

spier