I know how to pass arguments when running a scrapy spider from the command line. However, I'm having problems when trying to run it programatically from a script using scrapy's cmdline.execute().
The arguments I need to pass are lists that I previously formatted as strings, just like this:
numbers = "one,two,three,four,five"
colors = "red,blue,black,yellow,pink"
cmdline.execute('scrapy crawl myspider -a arg1='+numbers+' -a arg2='+colors)
and the spider is...
class MySpider(Spider):
name = "myS"
def __init__(self, arg1, arg2):
super(MySpider, self).__init__()
#Rest of the code
However, when I run it I get this error:
Traceback (most recent call last):
File "C:/Users/ME/projects/script.py", line 207, in run
cmdline.execute("scrapy crawl myS -a arg1="+numbers+" -a data="+colors)
File "C:\Python27\lib\site-packages\scrapy\cmdline.py", line 123, in execute
cmdname = _pop_command_name(argv)
File "C:\Python27\lib\site-packages\scrapy\cmdline.py", line 57, in _pop_command_name
del argv[i]
TypeError: 'str' object doesn't support item deletion
Any ideas?
OS: Windows7; Python version: 2.7.8
The spider will receive arguments in its constructor. Scrapy puts all the arguments as spider attributes and you can skip the init method completely. Beware use getattr method for getting those attributes so your code does not break. Succinct, robust and flexible!
Using the scrapy tool You can start by running the Scrapy tool with no arguments and it will print some usage help and the available commands: Scrapy X.Y - no active project Usage: scrapy <command> [options] [args] Available commands: crawl Run a spider fetch Fetch a URL using the Scrapy downloader [...]
Basic ScriptThe key to running scrapy in a python script is the CrawlerProcess class. This is a class of the Crawler module. It provides the engine to run scrapy within a python script. Within the CrawlerProcess class code, python's twisted framework is imported.
It uses lxml library under the hood, and implements an easy API on top of lxml API. It means Scrapy selectors are very similar in speed and parsing accuracy to lxml.
The execute()
function expects a list of arguments, not a string. Try this:
cmdline.execute([
'scrapy', 'crawl', 'myspider',
'-a', 'arg1='+numbers, '-a', 'arg2='+colors])
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With