I'm building a Lifestreaming app that will involve pulling down lots of feeds for lots of users, and performing data-mining, and machine learning algorithms on the results. GAE's load balanced and scalable hosting sounds like a good fit for a system that could eventually be moving around a LOT of data, but it's lack of cron jobs is a nuisance. Would I be better off using Django on a co-loc and dealing with my own DB scaling?
Google App Engine (GAE) is a service for developing and hosting Web applications in Google's data centers, belonging to the platform as a service (PaaS) category of cloud computing.
Google App Engine provides four possible runtime environments for applications, one for each of four programming languages: Java, Python, PHP, and Go. The environment you choose depends on the language and related technologies you want to use for developing the application.
Google App Engine (GAE) is a platform-as-a-service product that provides web app developers and enterprises with access to Google's scalable hosting and tier 1 internet service. GAE requires that applications be written in Java or Python, store data in Google Bigtable and use the Google query language.
While I can not answer your question directly, my experience of building Microupdater (a news aggregator collecting a few hundred feeds on AppEngine) may give you a little insight.
Fetching feeds. Fetching lots of feeds by cron jobs (it was the only solution until SDK 1.2.5) is not efficient and scalable, which has lower limit on job frequency (say 1 min, so you could only fetch at most 60 feeds hourly). And with latest SDK 1.2.5, there is XMPP API, which I have not implemented yet. The best promising approach would be PubSubHubbub, of which you offer an callback url and HubBub will notify you new entries in real-time. And there is an demo implementation on AppEngine, which you can play around.
Parsing feeds. You may already know that parsing feeds is cpu-intensive. I use Universal Feed Parser by Mark Pilgrim, when parsing a large feed (say a public google reader topic), AppEngine may fail to process all entries. My dashboard have a lot of these CPU-limit warnings. But it may result in my incapability to optimize the code yet.
Totally said, AppEngine is not yet an ideal platform for lifestream app, but that may change in future.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With