Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Thinking in AppEngine

I'm looking for resources to help migrate my design skills from traditional RDBMS data store over to AppEngine DataStore (ie: 'Soft Schema' style). I've seen several presentations and all touch on the the overarching themes and some specific techniques.

I'm wondering if there's a place we could pool knowledge from experience ("from the trenches") on real-world approaches to rethinking how data is structured, especially porting existing applications. We're heavily Hibernate based and have probably travelled a bit down the wrong path with our data model already, generating some gnarly queries which our DB is struggling with.

Please respond if:

  1. You have ported a non-trivial application over to AppEngine
  2. You've created a common type of application from scratch in AppEngine
  3. You've done neither 1 or 2, but are considering it and want to share your own findings so far.
like image 377
Mark Renouf Avatar asked Jun 10 '09 16:06

Mark Renouf


3 Answers

I'm wondering if there's a place we could pool knowledge from experience

Various Google Groups are good for that, though I don't know if any are directly applicable to Java-GAE yet -- my GAE experience so far is all-Python (I'm kind of proud to say that Guido van Rossum, inventor of Python and now working at Google on App Engine, told me I had taught him a few things about how his brainchild worked -- his recommendation mentioning that is now the one I'm proudest, on amongst all those on my linkedin profile;-). [I work at Google but my impact on App Engine was very peripheral -- I worked on "building the cloud", cluster and network management SW, and App Engine is about making that infrastructure useful for third party developers].

There are indeed many essays & presentations on how best to denormalize and shard your data for optimal GAE scaling and performance -- they're of varying quality, though. The books that are out so far are so-so; many more are coming in the next few months, hopefully better ones (I had a project to write one of those, with two very skilled friends, but we're all so busy that we ended up dropping it). In general, I'd recommend the Google I/O videos and the essays that Google blessed in its app engine site and blogs, PLUS every bit of content from appenginefan's blog -- what Guido commended me for teaching him about GAE, I in turn mostly learned from appenginefan (partly through the wonderful app engine meetup in Palo Alto, but his blog is great too;-).

like image 69
Alex Martelli Avatar answered Oct 02 '22 13:10

Alex Martelli


I played around with Google App Engine for Java and found that it had many shortcomings:

This is not general purpose Java application hosting. In particular, you do not have access to a full JRE (e.g. cannot create threads, etc.) Given this fact, you pretty much have to build your application from the ground up with the Google App Engine JRE in mind. Porting any non-trival application would be impossible.

More pertinent to your datastore questions...

The datastore performance is abysmal. I was trying to write 5000 weather observations per hour -- nothing too massive -- but I could not do it because I kept on running into time out exception both with the datastore and the HTTP request. Using the "low-level" datastore API helped somewhat, but not enough.

I wanted to delete those weather observation after 24 hours to not fill up my quota. Again, could not do it because the delete operation took too long. This problem in turn led to my datastore quota filling up. Insanely, you cannot easily delete large swaths of data in the GAE datastore.

There are some features that I did like. Eclipse integration is snazzy. The appspot application server UI is a million times better than working with Tomcat (e.g. nice views of logs). But the minuses far outweighed those benefits for me.

In sum, I constantly found myself having to shave the yak, in order to do something that would have been pretty trivial in any normal Java / application hosting environment.

like image 27
Julien Chastang Avatar answered Oct 02 '22 12:10

Julien Chastang


The timeouts are tight and performance was ok but not great, so I found myself using extra space to save time; for example I had a many-to-many relationship between trading cards and players, so I duplicated the information of who owns what: Card objects have a list of Players and Player objects have a list of Cards.

Normally storing all your information twice would have been silly (and prone to get out of sync) but it worked really well.

In Python they recently released a remote API so you can get an interactive shell to the datastore so you can play with your datastore without any timeouts or limits (for example, you can delete large swaths of data, or refactor your models); this is fantastically useful since otherwise as Julien mentioned it was very difficult to do any bulk operations.

like image 41
Kiv Avatar answered Oct 02 '22 13:10

Kiv