Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Will using a Cloud PaaS automatically solve scalability issues?

I'm currently looking for a Cloud PaaS that will allow me to scale an application to handle anything between 1 user and 10 Million+ users ... I've never worked on anything this big and the big question that I can't seem to get a clear answer for is that if you develop, let's say a standard application with a relational database and soap-webservices, will this application scale automatically when deployed on a Paas solution or do you still need to build the application with fall-over, redundancy and all those things in mind?

Let's say I deploy a Spring Hibernate application to Amazon EC2 and I create single instance of Ubuntu Server with Tomcat installed, will this application just scale indefinitely or do I need more Ubuntu instances? If more than one Ubuntu instance is needed, does Amazon take care of running the application over both instances or is this the developer's responsibility? What about database storage, can I install a database on EC2 that will scale as the database grow or do I need to use one of their APIs instead if I want it to scale indefinitely?

CloudFoundry allows you to build locally and just deploy straight to their PaaS, but since it's in beta, there's a limit on the amount of resources you can use and databases are limited to 128MB if I remember correctly, so this a no-go for now. Some have suggested installing CloudFoundry on Amazon EC2, how does it scale and how is the database layer handled then?

GAE (Google App Engine), will this allow me to just deploy an app and not have to worry about how it scales and implements redundancy? There appears to be some limitations one what you can and can't run on GAE and their price increase recently upset quite a large number of developers, is it really that expensive compared to other providers?

So basically, will it scale and what needs to be done to make it scale?

like image 585
Jan Vladimir Mostert Avatar asked Mar 25 '12 07:03

Jan Vladimir Mostert


1 Answers

That's a lot of questions for one post. Anyway:

  1. Amazon EC2 does not scale automatically with load. EC2 is basically just a virtual machine. You can achieve scaling of EC2 instances with Auto Scaling and Elastic Load Balancing.

  2. SQL databases scale poorly. That's why people started using NoSQL databases in the first place. It's best to see which database your cloud provider offers as a managed service: Datastore on GAE and DynamoDB on Amazon.

  3. Installing your own database on EC2 instances is very impractical as EC2 has ephemeral storage (it looses all data on "disk" when it reboots).

  4. GAE Datastore is actually a one big database for all applications running on it. So it's pretty scalable - your million of users should not be a problem for it. http://highscalability.com/blog/2011/1/11/google-megastore-3-billion-writes-and-20-billion-read-transa.html

  5. Yes App Engine scales automatically, both frontend instances and database. There is nothing special you need to do to make it scale, just use their API.

  6. There are limitations what you can do with AppEngine:

    A. No local storage (filesystem) - you need to use Datastore or Blobstore.

    B. Comet is only supported via their proprietary Channels API

    C. Datastore is a NoSQL database: no JOINs, limited queries, limited transactions.

  7. Cost of GAE is not bad. We do 1M requests a day for about 5 dollars a day. The biggest saving comes from the fact that you do not need a system admin on GAE ( but you do need one for EC2). Compared to the cost of manpower GAE is incredibly cheap.

Some hints to save money (an speed up) GAE:

A. Use get instead of query in Datastore (requires carefully crafting natiral keys).

B. Use memcache to cache data you got form datastore. This can be done automatically with objectify and it's @Cached annotation.

C. Denormalize data. Meaning you write data redundantly in various places in order to get to it in as few operations as possible.

D. If you have a lot of REST requests from devices, where you do not use cookies, then switch off session support ( or roll your own as we did). Sessions use datastore under the hood and for every request it does get and put.

E. Read about adjusting app settings. Try different settings (depending how tolerant your app is to request delay and your traffic patterns/spikes). We were able to cut down frontend instances by 70%.

like image 189
Peter Knego Avatar answered Nov 16 '22 02:11

Peter Knego