Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Start scrapy from Flask route

I want to build a crawler which takes the URL of a webpage to be scraped and returns the result back to a webpage. Right now I start scrapy from the terminal and store the response in a file. How can I start the crawler when some input is posted on to Flask, process, and return a response back?

like image 870
Ashish Avatar asked Sep 28 '22 05:09

Ashish


1 Answers

You need to create a CrawlerProcess inside your Flask application and run the crawl programmatically. See the docs.

import scrapy
from scrapy.crawler import CrawlerProcess

class MySpider(scrapy.Spider):
    # Your spider definition
    ...

process = CrawlerProcess({
    'USER_AGENT': 'Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1)'
})

process.crawl(MySpider)
process.start() # The script will block here until the crawl is finished

Before moving on with your project I advise you to look into a Python task queue (like rq). This will allow you to run Scrapy crawls in the background and your Flask application will not freeze while the scrapes are running.

like image 94
nivix zixer Avatar answered Oct 10 '22 12:10

nivix zixer