Starting Scrapy from a Django view

Tags:

My experience with Scrapy is limited, and each time I use it, it's always through the terminal's commands. How can I get my form data (a url to be scraped) from my django template to communicate with scrapy to start doing scraping? So far, I've only thought of is to get the form's returned data from django's views and then try to reach into the spider.py in scrapy's directory to add the form data's url to the spider's start_urls. From there, I don't really know how to trigger the actual crawling since I'm used to doing it strictly through my terminal with commands like "scrapy crawl dmoz". Thanks.

tiny edit: Just discovered scrapyd... I think I may be headed in the right direction with this.

853

asked Nov 14 '14 02:11

pyramidface

1 Answers

You've actually answered it with an edit. The best option would be to setup scrapyd service and make an API call to schedule.json to trigger a scraping job to run.

To make that API http call, you can either use urllib2/requests, or use a wrapper around scrapyd API - python-scrapyd-api:

from scrapyd_api import ScrapydAPI

scrapyd = ScrapydAPI('http://localhost:6800')
scrapyd.schedule('project_name', 'spider_name')

If we put aside scrapyd and try to run the spider from the view, it will block the request until the twisted reactor would stop - therefore, it is not really an option.

You can though, start using celery (in tandem with django_celery) - define a task that would run your Scrapy spider and call the task from your django view. This way, you would put the task on the queue and would not have a user waiting for crawling to be finished.

Also, take a look at the django-dynamic-scraper package:

Django Dynamic Scraper (DDS) is an app for Django build on top of the scraping framework Scrapy. While preserving many of the features of Scrapy it lets you dynamically create and manage spiders via the Django admin interface.

answered Oct 06 '22 00:10

alecxe

Related questions
                            
                                SQLAlchemy query, join on relationship and order by count
                            
                                how to skip a unittest case in python 2.6
                            
                                How to run test suite in python setup.py
                            
                                Why sqlalchemy add \ to " for a perfect JSON string to postgresql json field?
                            
                                Hindi to English Transliteration [closed]
                            
                                PyCharm & Pyenv local?
                            
                                How does python function return objects?
                            
                                What's the difference between getattr(self, '__a') and self.__a in python?
                            
                                Is there a way to create a .ipynb from a .py file command line?
                            
                                Why calling .sort() function on Pandas Series sorts its values in-place and returns nothing? [duplicate]
                            
                                Flask flash and url_for with AJAX
                            
                                How to print rows if values appear in any column of pandas dataframe
                            
                                Apache Spark: Job aborted due to stage failure: "TID x failed for unknown reasons"
                            
                                How do you tell if a context manager is reusable or reentrant?
                            
                                How to patch classmethod with autospec in unmocked class?
                            
                                Does Python's subprocess.Popen accept spaces in paths?
                            
                                Evaluate UTF-8 literal escape sequences in a string in Python3
                            
                                Watch for a file with asyncio
                            
                                "IncompleteRead" Error when retrieving Twitter Data using Python
                            
                                Can I have SQLAlchemy do subquery eager loading without repeating the full original query?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Starting Scrapy from a Django view

Tags:

python

django

web-scraping

scrapy

pyramidface

People also ask

1 Answers

alecxe

Recent Activity

Donate For Us