I have a web crawling python script that takes hours to complete, and is infeasible to run in its entirety on my local machine. Is there a convenient way to deploy this to a simple web server? The script basically downloads webpages into text files. How would this be best accomplished? Thanks!
Python on Google Cloud integrates with Cloud Monitoring, Cloud Trace, Cloud Logging, and Error Reporting, allowing you to transparently instrument live production applications to rapidly diagnose performance bottlenecks and software bugs.
Run a Python script from GitHubOpen the AWS Systems Manager console at https://console.aws.amazon.com/systems-manager/ . In the navigation pane, choose Run Command. If the AWS Systems Manager home page opens first, choose the menu icon ( ) to open the navigation pane, and then choose Run Command. Choose Run command.
Since you said that performance is a problem and you are doing web-scraping, first thing to try is a Scrapy
framework - it is a very fast and easy to use web-scraping framework. scrapyd
tool would allow you to distribute the crawling - you can have multiple scrapyd
services running on different servers and split the load between each. See:
There is also a Scrapy Cloud
service out there:
Scrapy Cloud bridges the highly efficient Scrapy development environment with a robust, fully-featured production environment to deploy and run your crawls. It's like a Heroku for Scrapy, although other technologies will be supported in the near future. It runs on top of the Scrapinghub platform, which means your project can scale on demand, as needed.
As an alternative to the solutions already given, I would suggest Heroku. You can not only deploy easily a website, but also scripts for bots to run.
Basic account is free and is pretty flexible.
This blog entry, this one and this video contain practical examples of how to make it work.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With