Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Can't make standalone binary scrapy spider with cx_Freeze

A short description about my working environment: win 7 x64, python 2.7 x64, scrapy 0.22, cx_Freeze 4.3.2.

First, I developed a simple crawl-spider and it works fine. Then, using the core scrapy API, I created an external script main.py, which can run spider, and it also works as required. Here is the code of the script:

# external main.py using scrapy core API, 'test' is just replaced name of my project
from twisted.internet import reactor
from scrapy.crawler import Crawler
from scrapy import log, signals
from test.spiders.testSpider import TestSpider
from test import settings, pipelines
from scrapy.utils.project import get_project_settings

spider = TestSpider(domain='test.com')
settings = get_project_settings()
crawler = Crawler(settings)
crawler.signals.connect(reactor.stop, signal=signals.spider_closed)
crawler.configure()
crawler.crawl(spider)
crawler.start()
log.start()
reactor.run()

So now i'm trying to make binary for all of this with cx_Freeze using setup.py like in another topic here. Here is the code:

from cx_Freeze import setup, Executable

includes = ['scrapy', 'pkg_resources', 'lxml.etree', 'lxml._elementpath']

build_options = {'compressed' : True,
                'optimize' : 2,
                'namespace_packages' : ['zope', 'scrapy', 'pkg_resources'],
                'includes' : includes,
                'excludes' : []}

executable = Executable(script='main.py',
                        copyDependentFiles=True,
                        includes=includes)

setup(name='Stand-alone scraper',
      version='0.1',
      description='Stand-alone scraper',
      options= {'build_exe': build_options},
      executables=[executable])

It's normally compiling into exe-file. Problems starts when i try to run it:

Traceback (most recent call last):
  File "C:\Python27\lib\site-packages\cx_Freeze\initscripts\Console.py", line 27, in       <module>
    exec code in m.__dict__
  File "main.py", line 2, in <module>
    from scrapy.crawler import Crawler
  File "C:\Python27\lib\site-packages\scrapy\__init__.py", line 6, in <module>
    __version__ = pkgutil.get_data(__package__, 'VERSION').strip()
  File "C:\Python27\lib\pkgutil.py", line 591, in get_data
    return loader.get_data(resource_name)
IOError: [Errno 2] No such file or directory: 'scrapy\\VERSION'

I solved this problem just moving scrapy\version file from original source (python\lib\site-packages\scrapy) to library.zip\scapy in build-folder. After second run of main.exe i got another message:

Traceback (most recent call last):
  File "C:\Python27\lib\site-packages\cx_Freeze\initscripts\Console.py", line 27, in <module>
    exec code in m.__dict__
  File "main.py", line 11, in <module>
    crawler = Crawler(settings)
  File "C:\Python27\lib\site-packages\scrapy\crawler.py", line 20, in __init__
    self.stats = load_object(settings['STATS_CLASS'])(self)
  File "C:\Python27\lib\site-packages\scrapy\utils\misc.py", line 42, in load_object
    raise ImportError("Error loading object '%s': %s" % (path, e))
ImportError: Error loading object 'scrapy.statscol.MemoryStatsCollector': No module named statscol

I didn't find any solution of this, and just try to import module from error message in the my main.py. Briefly -it didn't work. Every new import i got a new message with another module (totally i tried to import 15 :)) modules, until got error about aes module in cryptography. I also tryied to use cx_freeze alternatives like py2exe and pyinstaller, but same result.

Can anybody help me to solve this problem? Thank you for reading to this point.

like image 217
Karen Oganesyan Avatar asked Nov 01 '22 02:11

Karen Oganesyan


1 Answers

Replace your cx_Freeze code with this.

import sys 
    from cx_Freeze import setup, Executable 
    build_exe_options = {"packages": ["os","twisted","scrapy","test"], "excludes": ["tkinter"],"include_msvcr":True} 

    base = None
    setup(  name = "MyScript", 
            version = "0.1",
            description = "Demo", 
            options = {"build_exe": build_exe_options}, 
            executables = [Executable("C:\\MyScript", base=base)]) 

The difference in code is I have included the whole of the packages so you can access all functions from them.

like image 190
user3491776 Avatar answered Nov 13 '22 03:11

user3491776