Let's say I have some plain text in HTML-like format like this:
<div id="foo"><p id="bar">Some random text</p></div>
And I need to be able to run XPath on it to retrieve some inner element. How can I convert plain text to some kind of object which I could use XPath on?
You can just use a normal selector on which to run the same xpath
, css
queries directly:
from scrapy import Selector
...
sel = Selector(text="<div id="foo"><p id="bar">Some random text</p></div>")
selected_xpath = sel.xpath('//div[@id="foo"]')
You can pass HTML code sample as string to lxml.html and parse it with XPath:
from lxml import html
code = """<div id="foo"><p id="bar">Some random text</p></div>"""
source = html.fromstring(code)
source.xpath('//div/p/text()')
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With