Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Final GAE vs AWS architectural decision

I know this has been asked one way or another before, but most of the main issues to do with GAE stability seem to have been asked around the end of 2008, early 2009, or aren't directly related to games at scale (which I'm interested in).

Basically, I have been arguing back and forth with my business partner about whether to use GAE or AWS for the back-end of our social game engine, and now it's crunch time. I love GAE (Java) for so many reasons, and although it used to be unstable, it's pretty good now. The main argument in favour of AWS is the fact that AWS has proven itself with multiple games running tens of millions of active users per day. The obvious pin-up child for AWS is Zynga, with its Farmville peaking at 80+million DAU. And that's just one of the hugely successful games running on the AWS infrastructure. Remarkable achievement.

So, one way or another it's KNOWN to work. GAE on the other hand doesn't have any examples that I could find doing these sorts of numbers. Not even close. So can I trust it? Is there a single example of a large social game with 2 million+ Daily Active Users, using GAE?

The main considerations for our social game back-end are:

  1. Reliable CDN (Amazon CloudFront/S3 is excellent for this, as is Google's obviously excellent DataStore).
  2. Ability to scale without falling over (AWS-EC2 is proven here, GAE doesn't seem to have examples of large game apps which can run into the 1000s of requests per second. GAE used to be quite unstable in this regard and so is my main concern).
  3. Reliable no-SQL database. (AWS-SimpleDB and Google's DataStore are both excellent for this. We really don't need SQL).
  4. Support/someone to call/contact if there is a problem. (This is one of the biggest worries with GAE. I have no idea who I can call, or if it's even possible. AWS has an SLA and support.)

I look forward to your thoughts, but please also note, this is not intended to start any sort of flame war. I love both systems, but both have their positives and negatives, but I'm about to make an architectural decision that likely won't be undone moving forward.

Regards,

Shane

like image 409
Shane Avatar asked Dec 02 '10 06:12

Shane


People also ask

Which of the following is an architectural best practice recommended by AWS?

The AWS Well-Architected Framework helps you understand the pros and cons of decisions you make while building systems on AWS. Using the Framework helps you learn architectural best practices for designing and operating secure, reliable, efficient, cost-effective, and sustainable workloads in the AWS Cloud.

Why is an AWS well-architected review a critical part of the cloud design process?

This allows you to focus on the other aspects of design, such as functional requirements. The AWS Well-Architected Framework helps cloud architects build the most secure, high-performing, resilient, and efficient infrastructure possible for their applications.

Which pillar of the AWS well-architected framework provides recommendations to help customers select the right compute resources based on workload requirements?

Performance Efficiency Pillar Key topics include selecting resource types and sizes optimized for workload requirements, monitoring performance, and maintaining efficiency as business needs evolve.


2 Answers

I've never worked with AWS-EC2 so I'm going to share my knowledge just on the Google App Engine side.

  1. Google App Engine is not meant to be a CDN; though it can serve static content through its powerful infrastructure providing caching close to the users, it does not guarantee the same kind of high quality and high availability service of a real CDN because it's not part of its duties.
    Further data:
    • Maximum size of a file using the BlobStore service: 2 Gigabytes
    • Maximum size of a static file: 10 Megabytes
    • Currently App Engine always returns 200 status for static files even on Conditional gets (you have to rely on third party caching library like cirruxcache for example).
  2. Recently Google App Engine team has shut down the App Gallery for one simple reason: too many Toy Apps!
    Google wants to counteract this tendency showing successful businesses case studies; here are some of them:

    • BuddyPoke (viral Facebook app with 65 million installs)
    • WalkScore (serves 3 million request a day to thousands of real estate partner sites)
    • Webfillings
    • Snapabug
    • Optimizely
    • Ubisoft Facebook TikTok game

    Other interesting case studies here

  3. "We are well aware of downtimes and reliability issues, and are working hard to solve them: Improving App Engine reliability is our number one priority" was recently said by a Google Developer Relations Manager here.
    App Engine is still in beta and is an evolving platform so you have to be prepared to deal with downtimes and issues.

  4. Google App Engine team has just launched a preview of App Engine for Business providing 99.9% uptime service level agreement and premium developer support available.

Here is my opinion for what it's worth:
I'm aware that it's a tough call; having read a lot of articles about GAE I have mixed feelings about it because you can go from the recent catastrophic Carlos Ble report to the happy experience of Flower Garden or Gri.pe.
App Engine for Business looks promising and I would consider it in the case of a serious business project plan. The fresh SDK 1.4.0 is huge and it clearly shows that the Team is really pushing hard to fix some annoying issues (Warmup requests) and relaxing some limitations (10 minutes process on TaskQueue).

Last thing to consider: if you are going to have big numbers, the Google App Engine Team will probably take your app as a successfull case study to follow with a boost of free and powerful Hype.

like image 180
systempuntoout Avatar answered Sep 19 '22 12:09

systempuntoout


BuddyPoke is one example of a large-scale social app running on GAE. How large I'm not sure. This article says 30m daily page views (not users):

http://googleappengine.blogspot.com/2008/10/app-engine-case-studies.html

Their facebook page says 2.7 million monthly (not daily) users:

http://www.facebook.com/buddypoke

Although, they are also on a heap of other social networks:

http://www.buddypoke.com/

Personally I decided to go with GAE, for a couple of main reasons:

  • The unit of scalability is a single request, not a whole instance like it is with AWS.
  • I can work at a higher level, without having to worry about configuring instances.

If your point 4 is a big one for you, then you may be better off with AWS. With GAE there appears to be nothing you can do, and no-one you can contact.

About a week ago I had an issue with my app - it had suddenly started failing in Google's code, in a location which had been working fine for the last 5 days, ie since I had last uploaded my app. The only way to report issues to Google seems to be via their production issue template, here:

http://code.google.com/p/googleappengine/issues/entry?template=Production%20issue

I reported the issue, and didn't hear anything. Since it's running on Google's servers I was unable to resort to any 'usual' emergency tactics like restarting a server. An hour later and the problem resolved itself - I'm not sure if someone at Google saw my message and fixed something, or if it just went away. I updated my bug report to say the problem was fixed, but even now a week later the issue hasn't been closed or even acknowledged. Also since the issue has to be posted publicly, my app is now getting random hits from bots.

Admittedly my app is currently only in beta and so only has a hundred or so users, and so it wasn't a major incident for me. If I was getting thousands / millions of hits, maybe either Google would have noticed the problem themselves earlier, or they would have paid more attention to my bug report.

On your point 3, even my small app with a small amount of traffic throws occasional data store errors (even during times which aren't reported on the availability charts as outages).

Having said this, I still like GAE (I am using the Python version), and plan to stick with it. The promise of GAE is its scalability - although it falls over occasionally now for my small traffic, it shouldn't fall over any more when it scales to much more traffic (ie your point 2), provided I've coded it correctly to avoid contention. I'll see how it goes.

Finally regarding your point 1, the blobstore and/or static files are more like a CDN on GAE, than the datastore. However for very large amounts of traffic, a real CDN may be cheaper. It's also not necessarily a CDN, see Google app engine & CDN.

like image 35
Saxon Druce Avatar answered Sep 19 '22 12:09

Saxon Druce