Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

twisted critical unhandled error on scrapy tutorial

I'm new in programming and I'm trying to learn scrapy, using scrapy tutorial: http://doc.scrapy.org/en/latest/intro/tutorial.html

So I ran "scrapy crawl dmoz" command and got this error:

2015-07-14 16:11:02 [scrapy] INFO: Scrapy 1.0.1 started (bot: tutorial)
2015-07-14 16:11:02 [scrapy] INFO: Optional features available: ssl, http11
2015-07-14 16:11:02 [scrapy] INFO: Overridden settings: {'NEWSPIDER_MODULE':     'tu
torial.spiders', 'SPIDER_MODULES': ['tutorial.spiders'], 'BOT_NAME':   'tutorial'}

2015-07-14 16:11:05 [scrapy] INFO: Enabled extensions: CloseSpider, TelnetConsol
e, LogStats, CoreStats, SpiderState
Unhandled error in Deferred:
2015-07-14 16:11:06 [twisted] CRITICAL: Unhandled error in Deferred:
2015-07-14 16:11:07 [twisted] CRITICAL:

I'm using windows 7 and python 2.7. Anybody knows what's the problem? How could I fix that?

EDIT: My spider file code is:

# This package will contain the spiders of your Scrapy project
#
# Please refer to the documentation for information on how to create and manage
# your spiders.
import scrapy


class DmozSpider(scrapy.Spider):
    name = "dmoz"
    allowed_domains = ["dmoz.org"]
     start_urls = [
    "http://www.dmoz.org/computers/programming/languages/python/books/",
    "http://www.dmoz.org/computer/programming/languages/python/resources/"
]

    def parse(self, response):
        filename = response.url.split("/")[-2] + '.html'
        with open(filename,'wb') as f:
            f.write(response.body)

items.py code:

import scrapy

class DmozItem(scrapy.Item):
    title = scrapy.Field()
    link = scrapy.Field()
    desc = scrapy.Field()

pip list:

  • bootstrap-admin (0.3.3)
  • cffi (1.1.2)
  • characteristic (14.3.0)
  • cryptography (0.9.3)
  • cssselect (0.9.1)
  • Django (1.7.7)
  • django-auth-ldap (1.2.4)
  • django-debug-toolbar (1.3.0)
  • django-mssql (1.6.2)
  • django-pyodbc (0.2.6)
  • django-pyodbc-azure (1.2.2)
  • django-redator (0.2.3)
  • django-reversion (1.8.5)
  • django-summernote (0.6.0)
  • django-windows-tools (0.1.1)
  • django-wysiwyg-redactor (0.4.3.2)
  • enum34 (1.0.4)
  • ez-setup (0.9)
  • flup (1.0.2)
  • idna (2.0)
  • ipaddress (1.0.13)
  • iso8601 (0.1.4)
  • logging (0.4.9.6)
  • lxml (3.4.4)
  • mechanize (0.2.5)
  • MySQL-python (1.2.4)
  • pbr (0.10.8)
  • Pillow (2.7.0)
  • pip (7.1.0)
  • pyasn1 (0.1.8)
  • pyasn1-modules (0.0.6)
  • pycparser (2.14)
  • pymongo (2.6)
  • pyodbc (3.0.7)
  • pyOpenSSL (0.15.1)
  • pypm (1.4.3)
  • python-ldap (2.4.18)
  • pythonselect (1.3)
  • pywin32 (218.3)
  • queuelib (1.2.2)
  • Scrapy (1.0.1)
  • selenium (2.44.0)
  • service-identity (14.0.0)
  • setuptools (18.0.1)
  • six (1.9.0)
  • sqlparse (0.1.15)
  • stevedore (1.3.0)
  • Twisted (15.2.1)
  • virtualenv (1.11.6)
  • virtualenv-clone (0.2.5)
  • virtualenvwrapper (4.3.2)
  • virtualenvwrapper-powershell (12.7.8)
  • w3lib (1.11.0)
  • xlrd (0.9.2)
  • zope.interface (4.1.2)

Thx for the attention and sry for my poor English, isn't my native language.

like image 630
Vinicius de Castro Avatar asked Sep 27 '22 07:09

Vinicius de Castro


1 Answers

I'm beginning to learn scrapy as well and encounter the same question with yours. After struggling with it for an afternoon, finally I found it's due to the pywin32 module only download without install. You can try input the command below in the cmd to finish the pywin32 module install and try crawl again:

python python27\scripts\pywin32_postinstall.py -install

I hope it will help!

like image 81
Luze Avatar answered Oct 02 '22 14:10

Luze