Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

scrapy: convert html string to HtmlResponse object

I have a raw html string that I want to convert to scrapy HTML response object so that I can use the selectors css and xpath, similar to scrapy's response. How can I do it?

like image 893
yayu Avatar asked Dec 05 '14 19:12

yayu


1 Answers

First of all, if it is for debugging or testing purposes, you can use the Scrapy shell:

$ cat index.html <div id="test">     Test text </div>  $ scrapy shell index.html >>> response.xpath('//div[@id="test"]/text()').extract()[0].strip() u'Test text' 

There are different objects available in the shell during the session, like response and request.


Or, you can instantiate an HtmlResponse class and provide the HTML string in body:

>>> from scrapy.http import HtmlResponse >>> response = HtmlResponse(url="my HTML string", body='<div id="test">Test text</div>', encoding='utf-8') >>> response.xpath('//div[@id="test"]/text()').extract()[0].strip() u'Test text' 
like image 85
alecxe Avatar answered Sep 22 '22 21:09

alecxe