Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Trying to get Scrapy into a project to run Crawl command

I'm new to Python and Scrapy and I'm walking through the Scrapy tutorial. I've been able to create my project by using DOS interface and typing:

scrapy startproject dmoz

The tutorial later refers to the Crawl command:

scrapy crawl dmoz.org

But each time I try to run that I get a message that this is not a legit command. In looking around further it looks like I need to be inside a project and that's what I can't figure out. I've tried changing directories into the "dmoz" folder I created in startproject but that does not recognize Scrapy at all.

I'm sure I'm missing something obvious and I'm hoping someone can point it out.

like image 994
Adam Smith Avatar asked Feb 14 '11 02:02

Adam Smith


2 Answers

You have to execute it in your 'startproject' folder. You will have another commands if it finds your scrapy.cfg file. You can see the diference here:

$ scrapy startproject bar
$ cd bar/
$ ls
bar  scrapy.cfg
$ scrapy
Scrapy 0.12.0.2536 - project: bar

Usage:
  scrapy <command> [options] [args]

Available commands:
  crawl         Start crawling from a spider or URL
  deploy        Deploy project in Scrapyd target
  fetch         Fetch a URL using the Scrapy downloader
  genspider     Generate new spider using pre-defined templates
  list          List available spiders
  parse         Parse URL (using its spider) and print the results
  queue         Deprecated command. See Scrapyd documentation.
  runserver     Deprecated command. Use 'server' command instead
  runspider     Run a self-contained spider (without creating a project)
  server        Start Scrapyd server for this project
  settings      Get settings values
  shell         Interactive scraping console
  startproject  Create new project
  version       Print Scrapy version
  view          Open URL in browser, as seen by Scrapy

Use "scrapy <command> -h" to see more info about a command


$ cd ..
$ scrapy
Scrapy 0.12.0.2536 - no active project

Usage:
  scrapy <command> [options] [args]

Available commands:
  fetch         Fetch a URL using the Scrapy downloader
  runspider     Run a self-contained spider (without creating a project)
  settings      Get settings values
  shell         Interactive scraping console
  startproject  Create new project
  version       Print Scrapy version
  view          Open URL in browser, as seen by Scrapy

Use "scrapy <command> -h" to see more info about a command
like image 60
anders Avatar answered Oct 26 '22 04:10

anders


The PATH environmental variables aren't set.

You can set the PATH environmental variables for both Python and Scrapy by finding System Properties (My Computer > Properties > Advanced System Settings) navigating to the Advanced tab and clicking the Environment Variables button. In the new window, scroll to Variable Path in the System Variables window and add the following lines separated by semi-colons

C:\{path to python folder}
C:\{path to python folder}\Scripts

example

C:\Python27;C:\Python27\Scripts

like image 20
Akersh Avatar answered Oct 26 '22 06:10

Akersh