Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Scrapyd can't find the code in a sub-directory

Tags:

scrapy

scrapyd

We have a quite normal Scrapy project, something like that:

project/
       setup.py
       scrapy.cfg
       SOME_DIR_WITH_PYTHON_MODULE/
                                  __init__.py
       project/
              settings.py
              pipelines.py
              __init__.py
              spiders/
                     __init__.py
                     somespider.py

Everything works great if we run it from command line scrapy crawl somespider...

But when we deploy it and run using Scrapyd, it just fails to import the code from SOME_DIR_WITH_PYTHON_MODULE. Looks like it doesn't see the code there for some unknown reasons.

We tried to import it in the pipelines.py file. Tried like that:

from project.SOME_DIR_WITH_PYTHON_MODULE import *

and like that:

from SOME_DIR_WITH_PYTHON_MODULE import *

...and nothing worked. Though it worked if ran from command-line 'direct' execution using scrapy crawl.

What should we do to make it work?

Thanks!

like image 262
Spaceman Avatar asked Nov 10 '22 19:11

Spaceman


1 Answers

Actually, I found the reason. I should've used data_files param:

setup(
    name='blabla',
    version='1.0',
    packages=find_packages(),
    entry_points={'scrapy': ['settings = blabla.settings']},
    zip_safe=False,
    include_package_data=True,
    data_files=[(root, [os.path.join(root, f) for f in files])
         for root, _, files in itertools.chain(os.walk('monitoring'),
                                               os.walk('blabla/data'))],
    install_requires=[
        "Scrapy>=0.22",
    ],
    extras_require={
        'Somemodule': ["numpy"],
    }
)

That's a bit weird because the code is the data, actually... but it worked for us.

Thanks for the attention. Solved.

like image 197
Spaceman Avatar answered Jan 04 '23 03:01

Spaceman