Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Scrapy: Pass arguments to cmdline.execute()

I know how to pass arguments when running a scrapy spider from the command line. However, I'm having problems when trying to run it programatically from a script using scrapy's cmdline.execute().

The arguments I need to pass are lists that I previously formatted as strings, just like this:

numbers = "one,two,three,four,five"
colors = "red,blue,black,yellow,pink"

cmdline.execute('scrapy crawl myspider -a arg1='+numbers+' -a arg2='+colors)

and the spider is...

    class MySpider(Spider):

        name = "myS"

        def __init__(self, arg1, arg2):
            super(MySpider, self).__init__()

#Rest of the code

However, when I run it I get this error:

  Traceback (most recent call last):
  File "C:/Users/ME/projects/script.py", line 207, in run
    cmdline.execute("scrapy crawl myS -a arg1="+numbers+" -a data="+colors)
  File "C:\Python27\lib\site-packages\scrapy\cmdline.py", line 123, in execute
    cmdname = _pop_command_name(argv)
  File "C:\Python27\lib\site-packages\scrapy\cmdline.py", line 57, in _pop_command_name
    del argv[i]
TypeError: 'str' object doesn't support item deletion

Any ideas?

OS: Windows7; Python version: 2.7.8

like image 382
lolog Avatar asked Feb 05 '15 21:02

lolog


People also ask

How are arguments passed in Scrapy?

The spider will receive arguments in its constructor. Scrapy puts all the arguments as spider attributes and you can skip the init method completely. Beware use getattr method for getting those attributes so your code does not break. Succinct, robust and flexible!

How do you use Scrapy in CMD?

Using the scrapy tool You can start by running the Scrapy tool with no arguments and it will print some usage help and the available commands: Scrapy X.Y - no active project Usage: scrapy <command> [options] [args] Available commands: crawl Run a spider fetch Fetch a URL using the Scrapy downloader [...]

How do you run a Scrapy in a script?

Basic ScriptThe key to running scrapy in a python script is the CrawlerProcess class. This is a class of the Crawler module. It provides the engine to run scrapy within a python script. Within the CrawlerProcess class code, python's twisted framework is imported.

Does Scrapy use LXML?

It uses lxml library under the hood, and implements an easy API on top of lxml API. It means Scrapy selectors are very similar in speed and parsing accuracy to lxml.


1 Answers

The execute() function expects a list of arguments, not a string. Try this:

cmdline.execute([
    'scrapy', 'crawl', 'myspider',
    '-a', 'arg1='+numbers, '-a', 'arg2='+colors])
like image 185
Andrea Corbellini Avatar answered Sep 28 '22 10:09

Andrea Corbellini