The HTML structure is like this:
<td class='hey'>
<a href="https://example.com">First one</a>
</td>
This is my selector:
m_URL = sel.css("td.hey a:nth-child(1)[href] ").extract()
My selector now will output <a href="https://example.com">First one</a>
, but I only want it to output the link itself: https://example.com
.
How can I do that?
The attr() CSS function is used to retrieve the value of an attribute of the selected element and use it in the stylesheet. It can also be used on pseudo-elements, in which case the value of the attribute on the pseudo-element's originating element is returned.
The [attribute=value] selector is used to select elements with the specified attribute and value.
Use the querySelector() method to get a DOM element by attribute, e.g. document. querySelector('[data-id="first"]') . The querySelector method will return the first element in the document that matches the specified attribute.
Type “css=input[type='submit']” (locator value) in Selenium IDE. Click on the Find Button. The “Sign in” button will be highlighted, verifying the locator value. Attribute: Used to create the CSS Selector.
you may try this:
m_URL = sel.css("td.hey a:nth-child(1)").xpath('@href').extract()
Get the ::attr(value)
from the a
tag.
Demo (using Scrapy shell):
$ scrapy shell index.html
>>> response.css('td.hey a:nth-child(1)::attr(href)').extract()
[u'https://example.com']
where index.html
contains:
<table>
<tr>
<td class='hey'>
<a href="https://example.com">Fist one</a>
</td>
</tr>
</table>
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With