We've been using Scrapyd
service for a while up until now. It provides a nice wrapper around a scrapy project and its spiders letting to control the spiders via an HTTP API:
Scrapyd is a service for running Scrapy spiders.
It allows you to deploy your Scrapy projects and control their spiders using a HTTP JSON API.
But, recently, I've noticed another "fresh" package - ScrapyRT
that, according to the project description, sounds very promising and similar to Scrapyd
:
HTTP server which provides API for scheduling Scrapy spiders and making requests with spiders.
Is this package an alternative to Scrapyd
? If yes, what is the difference between the two?
They don't have thaaat much in common. As you have already seen you have to deploy your spiders to scrapyd and then schedule crawls. scrapyd is a standalone service running on a server where you can deploy and run every project/spider you like.
With ScrapyRT you choose one of your projects and you cd
to that directory. Then you run e.g. scrapyrt
and you start crawls for spiders on that project through a simple (and very similar to scrapyd's) REST API. Then you get crawled items back as part of the JSON response.
It's a very nice idea and it looks fast, lean and well defined. Scrapyd on the other hand is more mature and more generic.
Here are some key differences:
url
argument which as far as I can tell overrides any start_urls
-related logic.I would say that ScrapyRT and Scrapyd very cleverly don't overlap at this point in time. Of course you never know what future holds.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With