Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

SaaS application needs to export/backup data to individual customer sites

We have a cloud based SaaS application and many of our customers (school systems) require that a backup of their data be stored on-site for them.

All of our application data is stored in a single MS SQL database. At the very top of the "hierarchy" we have an "Organization". This organization represents a single customer in our system. Each organization has many child tables/objects/data. Each having FK relationships that ultimately end at "Organization".

We need a way to extract a SINGLE customer's data from the database and bundle it in some way so that it can be downloaded to the customers site. Preferably in a SQL Express, SQLite or an access database.

For example: Organization -> Skill Area -> Program -> Target -> Target Data are all tables in the system. Each one linking back to the parent by a FK. I need to get all the target data, targets, programs and skill areas per organization and export that data.

Does anyone have any suggestions about how to do this within SQL Server, a C# service, or a 3-rd party tool?

I need this solution to be easy to replicate for each customer who wants this feature "turned on"

Ideas?

like image 269
Chris Kooken Avatar asked Feb 08 '12 18:02

Chris Kooken


2 Answers

I'm a big fan of using messaging to propagate data at the moment, so here's a message based solution that will allow external customers to keep a local, in sync copy of the data which you provide on the web.

The basic architecture would be an online, password secured and user specific list of changes which have occurred in the system. At the server side this list would be appended to any time there was a change to an entity which is relevant to the specific customer. At the client would run an application which checks the list of changes for any it hasn't yet received and then applies them to its local database (in the order they occurred).

There a a bunch of different ways of doing the list based component of the system but my gut feeling is that you would be best to use something like RSS to do this.

Below is a practical scenario of how this could work:

  1. A new skill area is created for organisation "my org"
  2. The skill is added to the central database and associated with the "my org" reccord
  3. A SkillAreaExists event is also added at the same time to the "my org" RSS with JSON or XML data specifying the properties of the new skill area
  4. A new program is added to the skill area that was just created
  5. The program is added to the central database and associated with the skill area
  6. A ProgramExists event is also added at the same time to the "my org" RSS with JSON or XML data specifying the properties of the new program
  7. A SkillAreaHasProgram event is also added at the same time to the "my org" RSS with JSON or XML data specifying an identifier for the skill area and program
  8. The client agent checks the RSS feed and sees the new messages and processes them in order
  9. When the SkillAreaExists event is processed a new Skill area is added to the local DB
  10. When the ProgramExists event is processed a new Program is added to the local DB
  11. When the SkillAreaHasProgram event is processed the program is linked to the skill area

This approach has a whole bunch of benefits over traditional point in time replication.

  • Its online, a consumer of this can get realtime updates if required
  • Consistancy is maintained by order, at any point in time in the event stream if you stop receiving events you have a local DB which accuratly reflects the central DB as at some point in time.
  • Its diff based, you only need to recieve changes
  • Its auditable, you can see whats actually happened not just the current state.
  • Its easily recoverable, if there's a data consistency issue you can revert the entire DB by replaying the event stream.
  • It allows for multiple consumers, lots of individual copies of the clients info can exist and function autonomously.

We have had a great deal of success with these techniques for replicating data between sites especially when they are only sometimes online.

like image 70
Not loved Avatar answered Oct 24 '22 09:10

Not loved


While there are some very interesting enterprise solutions that have been suggested, I think my approach would be to develop a plane old scheduled backup solution that simply exports the data for each organisation with a stored procedure or just a number of select statements.

Admittedly you'll have to keep this up to date as your database schema changes but if this is a production application I cant imagine that happens very drastically.

There are any number of technologies available to do this, be it SSIS, a custom windows service, or even something as rudimentary as a scheduled task that kicks off a stored procedure from the command line.

The format you choose to export to is entirely up to you and should probably be driven by how the backup is intended to be used. I might consider writing data to a number of CSV files and zipping the result such that it could be imported into other platforms should the need arise.

Other options might be to copy data across to a scratch database and then simply create a SQL backup of that database.

However you choose to go about it, I would encourage you to ensure that the process is well documented and has as much automated installation and setup as possible. Systems with loosely coupled dependencies such as common file locations or scheduled tasks are prone to getting tweaked and changed over time. Without those tweaks and changes being recorded you can create a system that works but can't be replicated. Soon no one wants to touch it and no one remembers exactly how it works. When it eventual needs changing, or worse it breaks, you have to start reverse engineering before you can fix it.

In a cloud based environment this is especially important because you want to be able to deploy as quickly as possible. If there is a lot of configuration that needs to be done you're likely to make mistakes or just be inconsistent. By creating a nuke-and-repave deployment you have a single point that you can change installation and configuration, safe in the knowledge that the change will be consistent across any deployment.

like image 1
Sam Greenhalgh Avatar answered Oct 24 '22 08:10

Sam Greenhalgh