Scrapy

Question

Let's say I have some plain text in HTML-like format like this:

<div id="foo"><p id="bar">Some random text</p></div>

And I need to be able to run XPath on it to retrieve some inner element. How can I convert plain text to some kind of object which I could use XPath on?

eLRuLL · Accepted Answer

You can just use a normal selector on which to run the same xpath, css queries directly:

from scrapy import Selector

...

sel = Selector(text="<div id="foo"><p id="bar">Some random text</p></div>")
selected_xpath = sel.xpath('//div[@id="foo"]')

Andersson · Answer

You can pass HTML code sample as string to lxml.html and parse it with XPath:

from lxml import html

code = """<div id="foo"><p id="bar">Some random text</p></div>"""
source = html.fromstring(code)
source.xpath('//div/p/text()')

Scrapy - how to convert string into an object which I can use XPath on?

Tags:

xpath

FTM

2 Answers

eLRuLL

Andersson

Recent Activity

Donate For Us