Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to run scrapy spider inside asyncio even loop?

Tags:

python

scrapy

It looks that I came to the dead end. Is there any way to run scrapy spider inside asyncio loop? For example in the code below:

import asyncio
from scrapy.crawler import CrawlerProcess
from myscrapy import MySpider
import scrapy

async def do_some_work():
    process = CrawlerProcess()
    await process.crawl(MySpider)

loop = asyncio.get_even_loop()
loop.run_until_complete(do_some_work())

Which leads me to the error:

raise TypeError('A Future, a coroutine or an awaitable is required')
TypeError: A Future, a coroutine or an awaitable is required

I do understand that after await there should be another coroutine. Is there any way to bypass it and still make it work asynchronous? Thank you

like image 206
Lord G. Avatar asked Nov 07 '22 13:11

Lord G.


1 Answers

Scrapy does not currently support the async syntax.

If you need to run Scrapy within asyncio-based code, you need to run Scrapy as a script it just as you would run any other synchronous code within an asynchronous function.

That said, it is something that might be available in the future. There has been a Google Summer of Code proposal in that direction, and there is an ongoing discussion on the topic.

like image 131
Gallaecio Avatar answered Nov 14 '22 21:11

Gallaecio