Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Scrapy - TypeError: Cannot convert unicode body - HtmlResponse has no encoding

When I try to construct a HtmlResponse object in Scrapy like this:

scrapy.http.HtmlResponse(url=self.base_url + dealer_url[0], body=dealer_html)

I got this error:

Traceback (most recent call last):

  File "d:\kerja\hit\python~1\<project_name>\<project_name>\lib\site-packages\twisted\internet\defer.py", line 588, in _runCallbacks

    current.result = callback(current.result, *args, **kw)

  File "D:\Kerja\HIT\Python Projects\<project_name>\<project_name>\<project_name>\<project_name>\spiders\fwi.py", line 69, in parse_items

    dealer_page = scrapy.http.HtmlResponse(url=self.base_url + dealer_url[0], body=dealer_html)

  File "d:\kerja\hit\python~1\<project_name>\<project_name>\lib\site-packages\scrapy\http\response\text.py", line 27, in __init__

    super(TextResponse, self).__init__(*args, **kwargs)

  File "d:\kerja\hit\python~1\<project_name>\<project_name>\lib\site-packages\scrapy\http\response\__init__.py", line 18, in __init__

    self._set_body(body)

  File "d:\kerja\hit\python~1\<project_name>\<project_name>\lib\site-packages\scrapy\http\response\text.py", line 43, in _set_body

    type(self).__name__)

TypeError: Cannot convert unicode body - HtmlResponse has no encoding

Does anyone know how to solve this error?

like image 984
Aminah Nuraini Avatar asked Dec 15 '22 04:12

Aminah Nuraini


1 Answers

HtmlResponse is trying to detect encoding:

The HtmlResponse class is a subclass of TextResponse which adds encoding auto-discovering support by looking into the HTML meta http-equiv attribute. See TextResponse.encoding.

So basically the html string you provide to body parameter(dealer_html in your case) doesn't have encoding specified. As per w3 docs of http-equiv it should have:

HTML 4.01: <meta http-equiv="content-type" content="text/html; charset=UTF-8">
HTML5: <meta charset="UTF-8">

In this case you can either fix your html or specify encoding when creating the HtmlResponse object via encoding parameter:

HtmlResponse(url='http://scrapy.org', body=u'some body', encoding='utf-8')
like image 75
Granitosaurus Avatar answered Jan 10 '23 20:01

Granitosaurus