This should be easy but I'm stuck.
<div class="paginationControl">
<a href="/en/overview/0-All_manufactures/0-All_models.html?page=2&powerunit=2">Link Text 2</a> |
<a href="/en/overview/0-All_manufactures/0-All_models.html?page=3&powerunit=2">Link Text 3</a> |
<a href="/en/overview/0-All_manufactures/0-All_models.html?page=4&powerunit=2">Link Text 4</a> |
<a href="/en/overview/0-All_manufactures/0-All_models.html?page=5&powerunit=2">Link Text 5</a> |
<!-- Next page link -->
<a href="/en/overview/0-All_manufactures/0-All_models.html?page=2&powerunit=2">Link Text Next ></a>
</div>
I'm trying to use Scrapy (Basespider) to select a link based on it's Link text using:
nextPage = HtmlXPathSelector(response).select("//div[@class='paginationControl']/a/@href").re("(.+)*?Next")
For example, I want to select the next page link based on the fact that it's text is "Link Text Next". Any ideas?
You can use the following XPath expression:
//div[@class='paginationControl']/a[text()="Link Text Next"]/@href
This selects the href
attributes of the link with text "Link Text Next"
.
See XPath string functions if you need more control.
Use a[contains(text(),'Link Text Next')]
:
nextPage = HtmlXPathSelector(response).select(
"//div[@class='paginationControl']/a[contains(text(),'Link Text Next')]/@href")
Reference: Documentation on the XPath contains function
PS. Your text Link Text Next
has a space at the end. To avoid having to include that space in the code:
text()="Link Text Next "
I think using contains
is a bit more general while still being specific enough.
Your xpath is selecting the href not the text in the a
tag. It doesn't look from your example like the href has next
in it, so you can't find it with an RE.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With