I know this has been asked one way or another before, but most of the main issues to do with GAE stability seem to have been asked around the end of 2008, early 2009, or aren't directly related to games at scale (which I'm interested in).
Basically, I have been arguing back and forth with my business partner about whether to use GAE or AWS for the back-end of our social game engine, and now it's crunch time. I love GAE (Java) for so many reasons, and although it used to be unstable, it's pretty good now. The main argument in favour of AWS is the fact that AWS has proven itself with multiple games running tens of millions of active users per day. The obvious pin-up child for AWS is Zynga, with its Farmville peaking at 80+million DAU. And that's just one of the hugely successful games running on the AWS infrastructure. Remarkable achievement.
So, one way or another it's KNOWN to work. GAE on the other hand doesn't have any examples that I could find doing these sorts of numbers. Not even close. So can I trust it? Is there a single example of a large social game with 2 million+ Daily Active Users, using GAE?
The main considerations for our social game back-end are:
I look forward to your thoughts, but please also note, this is not intended to start any sort of flame war. I love both systems, but both have their positives and negatives, but I'm about to make an architectural decision that likely won't be undone moving forward.
Regards,
Shane
The AWS Well-Architected Framework helps you understand the pros and cons of decisions you make while building systems on AWS. Using the Framework helps you learn architectural best practices for designing and operating secure, reliable, efficient, cost-effective, and sustainable workloads in the AWS Cloud.
This allows you to focus on the other aspects of design, such as functional requirements. The AWS Well-Architected Framework helps cloud architects build the most secure, high-performing, resilient, and efficient infrastructure possible for their applications.
Performance Efficiency Pillar Key topics include selecting resource types and sizes optimized for workload requirements, monitoring performance, and maintaining efficiency as business needs evolve.
I've never worked with AWS-EC2 so I'm going to share my knowledge just on the Google App Engine side.
Recently Google App Engine team has shut down the App Gallery for one simple reason: too many Toy Apps!
Google wants to counteract this tendency showing successful businesses case studies; here are some of them:
Other interesting case studies here
"We are well aware of downtimes and reliability issues, and are working hard to solve them: Improving App Engine reliability is our number one priority" was recently said by a Google Developer Relations Manager here.
App Engine is still in beta and is an evolving platform so you have to be prepared to deal with downtimes and issues.
Google App Engine team has just launched a preview of App Engine for Business providing 99.9% uptime service level agreement and premium developer support available.
Here is my opinion for what it's worth:
I'm aware that it's a tough call; having read a lot of articles about GAE I have mixed feelings about it because you can go from the recent catastrophic Carlos Ble report to the happy experience of Flower Garden or Gri.pe.
App Engine for Business looks promising and I would consider it in the case of a serious business project plan.
The fresh SDK 1.4.0 is huge and it clearly shows that the Team is really pushing hard to fix some annoying issues (Warmup requests) and relaxing some limitations (10 minutes process on TaskQueue).
Last thing to consider: if you are going to have big numbers, the Google App Engine Team will probably take your app as a successfull case study to follow with a boost of free and powerful Hype.
BuddyPoke is one example of a large-scale social app running on GAE. How large I'm not sure. This article says 30m daily page views (not users):
http://googleappengine.blogspot.com/2008/10/app-engine-case-studies.html
Their facebook page says 2.7 million monthly (not daily) users:
http://www.facebook.com/buddypoke
Although, they are also on a heap of other social networks:
http://www.buddypoke.com/
Personally I decided to go with GAE, for a couple of main reasons:
If your point 4 is a big one for you, then you may be better off with AWS. With GAE there appears to be nothing you can do, and no-one you can contact.
About a week ago I had an issue with my app - it had suddenly started failing in Google's code, in a location which had been working fine for the last 5 days, ie since I had last uploaded my app. The only way to report issues to Google seems to be via their production issue template, here:
http://code.google.com/p/googleappengine/issues/entry?template=Production%20issue
I reported the issue, and didn't hear anything. Since it's running on Google's servers I was unable to resort to any 'usual' emergency tactics like restarting a server. An hour later and the problem resolved itself - I'm not sure if someone at Google saw my message and fixed something, or if it just went away. I updated my bug report to say the problem was fixed, but even now a week later the issue hasn't been closed or even acknowledged. Also since the issue has to be posted publicly, my app is now getting random hits from bots.
Admittedly my app is currently only in beta and so only has a hundred or so users, and so it wasn't a major incident for me. If I was getting thousands / millions of hits, maybe either Google would have noticed the problem themselves earlier, or they would have paid more attention to my bug report.
On your point 3, even my small app with a small amount of traffic throws occasional data store errors (even during times which aren't reported on the availability charts as outages).
Having said this, I still like GAE (I am using the Python version), and plan to stick with it. The promise of GAE is its scalability - although it falls over occasionally now for my small traffic, it shouldn't fall over any more when it scales to much more traffic (ie your point 2), provided I've coded it correctly to avoid contention. I'll see how it goes.
Finally regarding your point 1, the blobstore and/or static files are more like a CDN on GAE, than the datastore. However for very large amounts of traffic, a real CDN may be cheaper. It's also not necessarily a CDN, see Google app engine & CDN.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With