Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is scrapy supported on google app engine?

It has following dependencies: - Twisted 2.5.0, 8.0 or above - lxml or libxml2 (if using libxml2, version 2.6.28 or above is highly recommended) - simplejson - pyopenssl

like image 694
Zhaidarbek Avatar asked May 17 '11 15:05

Zhaidarbek


3 Answers

You cannot use C extensions on App Engine, which rules out lxml and (I believe) libxml2 and pyopenssl.

I doubt most of what Twisted does is possible in the App Engine sandbox either; you can't directly open sockets or spawn threads.

EDIT (January 2013): The Python 2.7 runtime does include some C extensions, including lxml. However, it's still not possible to use C extensions that aren't provided by Google with the runtime; most likely scrapy is still unusable at this time.

like image 75
Wooble Avatar answered Sep 28 '22 05:09

Wooble


No but you could try AWS (http://dev.scrapy.org/wiki/AmazonEC2)

like image 28
user Avatar answered Sep 28 '22 05:09

user


Update for 2019:
Scrapy indeed works on GAE. I can confirm that Scrapy can be deployed on GAE Python 3 standard environment using ScrapyRT.

Your scrapy.cfg file must be in the same directory as app.yaml to be picked up accordingly and a minimal setup would look like this:

runtime: python37

instance_class: F2

env_variables:
  PORT: 8080

entrypoint: scrapyrt -i 0.0.0.0 -p $PORT -s LOG_DIR=/tmp

Note how LOG_DIR is set to /tmp which is most likely not what anyone would want for production environment. I might extend this answer once i figured out how to approach this appropriately.

like image 29
nichoio Avatar answered Sep 28 '22 06:09

nichoio