What is the organic growth process from a standalone solution into a Software as a Service? Clearly
Scalability is not a "feature" tacked on at the end [of] development.^
so I'm interested in high level code and architecture changes required.
Does one pick an existing platform and overnormalize it?
Does one start over with bare bones cloud architecture, then migrate legacy functionality?
Do aggressive technology upgrades (i.e. web forms > MVC) fit into the process?
I've been asked for some clarification on the current project architecture. Without going into too much detail, think of a .NET webforms application that plugs into a layer of business logic and integrates with multiple third-party vendors. Whenever new platform instances are required (I lack the terminology here, what I mean is when a new client requires business logic adjustments, integration with different third-party providers, hot new branding etc.), existing code is branched and a new environment is set up. Any changes are effectively very low-level, whether or not they happen directly in aspx files, component code or db config.
This scenario seems perfectly suitable to have a "proper" SaaS model implemented, but I'm having difficulty constructively contributing to the migration process. To rephrase the original questions asked, which would be an efficient strategy to follow:
Overnormalize an existing platform and make everything configurable, effectively suspending this simulated scalability and not bringing on new clients until the architecture is refactored. The downside to this imho is continuing to rely on code and structure not built for scalability (details below).
Start from scratch with whatever is deemed to be (subjectievly) the best architecture for the solution going forward, then migrate legacy functionality as needed. This allows for almost any desired technology upgrade but lacks visibility until completed and, being an aggressive change, will be seen as inherently high risk by the management.
Personally I'm leaning towards the second option because of the amount of legacy code present and lack of sufficient db normalization. At the same time the existing solution is mature and functional (if it isn't broken, don't fix it) and there are likely many more ways to scale other than the two approaches I've listed above.
If the context above allows for scenario-specific advice, I'll take it. However I'm stil open to more general do-s and dont-s and pointers suitable for a wider audience.
The key is architecture. The way is to divide and conquer. The approach is to relax.
The most important component of a building is the architecture of it. The way space is shaped with walls and floors, windows and ceilings, i.e, elements of the construction itself. Architect's aim is not to design the walls. He designs them as a secondary part of the real job: designing the space that is shaped by the walls. We do not build buildings to have walls, we build them to have the space inside.
We first design functionality which is what we want from software. Then we go into details of making it possible, i.e building the product. Technologies we use are not the main thing, they are the walls and what we really want is the space itself.
With a good understanding of what we want from the software we are building, engineering becomes a much easier and almost automatic process. Technical difficulties start appearing with well understood definitions and fortunately, obvious solutions.
One very important measure of a good architecture is that it makes components of the solution clearly defined. When we can see these components of the big picture separately, we can divide the work into separate parts. We can then build separate things to work together.
Maybe this title sounds like I am trying to bring some humor to text; but no, please don't get me wrong. If you have a good architecture, you can relax, so can your web servers, database servers, engineers, customers, users, and the rest of the world. If this is not the situation, it means you should go back to work on your architecture.
Hey, what the heck I have been talking about here? It has been some paragraphs and three titles and I did not use a single software term except software. Such abstract chattery is for guys who have nothing to do all day but lazily walk around and talk and talk and talk... We are software people and we don't have any time for this. Hey I, Hasan, cut it short..!
Okay, I will try; but first of all, let's relax... then we can take a look at some real examples.
Let's say we are developing a web publishing service for professionals and also individuals. Every client will have a website of their own, running on our system. These websites can be ordinary personal websites with very low numbers of visitors or if our business get lucky, some of our other clients can be big publications like NY Times.
We need to solve two kinds of scalability problem: Scaling our business, our system as we start having more and more clients, running more and more websites. This one is a pretty easy problem compared to the second one, which is scaling a single website as it gets more and more visitors, more and more data, more and more applications to run on this data.
We can rewrite the question of "how to scale" as "how to divide" to see the solution more clearly. If we can divide something into small pieces, we can scale it by adding more resources to do those pieces, growing horizontally.
We will have data and applications that will work on that data. Let's say we have one database server and one web server and try to make this scalable.
Thinking of the web servers we will run for our service; if we do not keep data on those machines, they will turn into generic, equal components, little clients of data back-end that will interface this data to the rest of the world. By keeping our web servers light, silly, empty, we can easily make many of them to handle increasing number of requests.
Okay, turning web servers into just silly proxies is not the smartest idea. We need things to be done, applications to be run. And, because in our architecture, web servers are the easiest to multiply, we will want to do as many things as possible on these web servers. We will continue on this difficult problem under "Dividing Smarter" title below. Before that, let's look at what kind of architecture we currently have on table and what way it is scalable.
We use load balancers to make many web servers run in parallel and also divide many web sites into groups using DNS (even before requests hit our system) to multiple load balancers. For example, a.com, b.com, c.com to load balancer 1, a-very-big-website.com to load balancer 2, ... Each group of a load balancer, a set of web servers and a database server makes a separate universe in our system. Now we can have millions of websites and grow our system by adding more of these separate universes without any limits. We can serve as many clients with as many web sites as our marketing department can bring. Our first problem is solved, already. What about running big big big websites?
Of course we cannot divide a single website into separate universes as we do with separate websites; but this does not mean we cannot divide anything at all. We will continue dividing and conquering. To do this, we need a closer look at the problems we solve.
What is a website? Web pages, supporting content like css
and js
files, multimedia content like image and video files, and data, lots of data. Thanks to CDNs and awesome storage services cloud computing systems provide, static files are not an important part of our problem anymore...
The real thing we do is rendering web pages. Above, we thought of our web servers as very light, generic interfaces to our database back-end. We did not solve how to run applications in our universes yet. Now it is time to do that.
Every request to our system will come to a site and be processed by an application running for that site. The very first thing our web servers will do is to decide which site a request belongs to. On our database server, we keep a table that matches hostnames to sites. With every new client website we have, we will add one or more domains to this table to match this site. With every request to our web servers, we will query the database server and decide which site to load. Good?
No, not good. It is awful. Why?
We have a small number of websites; but a very big number of requests. Number of websites on a universe changes much less frequently than other kinds of data like comments on blog sites. This table is updated maybe a few times a day on an established universe. Querying such a tiny (a few thousands records, tiny!) database for every request again and again all day is not smart. The smart way to do this is to keep copies of this table on web servers and update only when they are updated. How do we know when site list is updated? We can keep a row with a number as our table version number. With each update we can increase this number. Or we can keep a time stamp of last update. Web servers can check database server for this number and compare it to their local in-memory versions. If table is newer, we pull the data again overwriting the local in-memory copy. This way, we will have reduced thousands of queries to tiny numbers. Big numbers, small numbers...
At this point, what materials we use on our buildings started to matter. Which languages, what kind of platforms and database systems, etc. Now, they matter because they can make our architecture work better or worse. For example, for the update of table, our database server can have a mechanism to notify web servers about the update. This way, we will have gone even further and completely removed the unnecessary queries of domain-site table. So, if the systems we have chosen provide such mechanisms, it means these systems are good choices for our architecture.
Dividing things in a smart way happens automatically when we understand what we want from our software well. It is very difficult to scale database servers. Since we need data to be together. By increasing the number of web servers, we scale horizontally without any limits; but for database server, this is not applicable. Database server has to keep access to data and machines has limits that we cannot scale in an efficient way.
Every database system provides ways for scaling like sharding or shared-nothing architecture. There can be a time that you have to use these; but as I see on forums, blogs, and other places people share their experiences, IMHO, people use these too aggressively and wrongly. They let their databases grow bigger and bigger then "hey it is time to scale, let's add some shards." 99% of all these applications are blind-running. People throw their problems to software and expect them to be solved like magic. Unfortunately, they realize very soon that there is no magic.
We should keep ourselves from blind-running solutions by watching our numbers: big numbers, small numbers. Also by understanding our system's inner workings and solving problems via architecture instead of heavy use of materials.
Here is an architectural solution: Architected Solution (by Calatrava).
Here are other solutions that depend on materials instead of good architecture: [Blind-run Solution 1], [Blind-run Solution 2]
Judge the difference yourself.
How can we scale a database server? Instead of blindly dividing tables in the middle, we can rethink about our data. Can we separate user account information from site templates? Of course, why not? Can we take keep different database servers for old data and fresh ones? A little more difficult, especially considering search facilities; but why not?
Divide smartly, not blindly! I accept there will be times that you cannot divide it anymore; but come on, how many of us are working for Google or Facebook?
-- Hey man, we have a very large data set and when we run...
-- Shush. First, go back and check your data set. Does it really have to be a large data set?
Most of the time, no, it does not. We just don't want to admit it...
Rebuilding everything from scratch takes time that many businesses cannot afford. The better way is to architect our current system without rewriting every component; but instead, separating and redefining them as components. This is mostly analysis followed by little changes. Every function call on a system can easily be a point of division. We can just cut the system from that point into two pieces.
A happy look on your current system for just a few hours will show you a lot of ideas on how to divide those pieces. Once you divided them, it is very easy to re-architect everything and then re-build your new system piece by piece. If I have a building and I need a bigger building on the same land, to build the new building without moving all the people already living there is a very difficult job; but not impossible. When it comes to software instead of buildings, it is much easier. So?
It is software. It is soft. You can copy your data, make tests on it, delete everything, copy a million more times. Once your architecture is designed well, your mistakes never cause catastrophic events. It is very difficult to turn a 6-seat dinner table into one that can serve 60 guests; but software is ... software and we can easily do such things. Relax.
-- The question above touches such an area that is impossible to cover in just a few paragraphs. Based on this part from the question: "However I'm still open to more general do-s and dont-s and pointers suitable for a wider audience." I tried to mention things in a general format without diving into details. Although I tried to give a few tiny examples of practical applications of my principles, I know I have left many open ends in this short text. I appreciate any criticism and questions in comments.
I'm interested in high level code and architecture changes required.
Unfortunately, there is no "correct" answer to how you should go about altering your architecture. The solution depends on what your current architecture looks like, as well as what your capabilities and preferences are as a developer. Some standalone systems may already HAVE a relatively scalable platform, whereas other may need to make improvements as they start to gain traction, whereas others may need to start over from scratch because their code base is unusable.
A solid code base is EXTREMELY IMPORTANT. Without an efficient and clean code base, it is unlikely that your architecture will ever scale desirably. Many companies make the mistake of putting band-aid after band-aid to solve short-term problems -- but in the long run, this never works out well. When something doesn't work right, take the time to fix it in the most logical way possible -- even if this means adjusting other code in your platform.
The best one can do is give you general advice on building a scalable system, i.e. use caching, design your system to scale horizontally, optimize your database, make sure to use db indexing wherever appropriate. Some best practices are architecture-dependent, but there are some general principles that pretty much every scalable platform needs to follow. For in-depth coverage of good scalability techniques and design patterns, I would check out Scalability Rules: 50 Principles For Scaling Websites.
As far as choosing a platform goes, that is completely up to you and your preferences as a developer. What do you like to code in? C#, Ruby, PHP? Go with the language and platform that your team agrees upon. I prefer Ruby on Rails and I love the MVC design pattern, but that doesn't mean it's the best solution for you. Go with what makes the most sense to you, and work with it to develop a scalable system.
In the past, when I have worked on systems that require high scalability, there has often come a point where I have found the need to start over from scratch. Unfortunately, not everyone has the foresight to know all of the best practices and features that they will require, and often times this results in less-than-ideal database and platform designs. The process of developing a system gives a lot of insight into what is truly required and what the best methodologies are for that system. Thus, there have been a few times in the past where I have been halfway through a product and realized the need for a new code base and so I start over and migrate whatever legacy code I find appropriate for the new design.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With