Ok So here is the problem we are facing. Currently: <ol> <li>We have a ton of Legacy Applications that have direct database access</li> <li>The data structure in the database is not normalized</li> <li>The current process / structure is used by almost all applications</li> </ol> What we are trying to implement: <ol> <li>Move all functionality to a RESTful service so no application has direct database access</li> <li>Implement a normalized data structure </li> </ol> The problem we are having is how to implement this migration not only with the Applications but with the Database as well. Our current solution is to: <ol> <li>Identify all the CRUD functionality and implement this in the new Web Service</li> <li>Create the new Applications to replace the Legacy Apps</li> <li>Point the New Applications to the new Web Service ( Still Pointing to the Old Data Structure )</li> <li>Migrate the data in the databases to the new Structure</li> <li>Point the New Applications to the new Web Service ( Point to new Data Structure )</li> </ol> But as we are discussing this process we are looking at having to rewrite the New Web Service twice. Once for the Old Data Structure and Once for the New Data Structure, As currently we could not represent the old Data Structure to fit the new Data Structure for the new Web Service. I wanted to know if anyone has faced any challenges like this and how did you overcome these types of issues/implementation and such.

EDIT: More explanation of synchronization using bi-directional triggers; updates for syntax, language and clarity. <h3>Preamble</h3> I have faced similar problems on a data model upgrade on a large web application I worked on for 7 years, so I feel your pain. From this experience, I would propose the something a bit different - but hopefully one that will be a lot easier to implement. But first, an observation: Value to the organisation is the data - data will long outlive all your current applications. The business will constantly invent new ways of getting value out of the data it has captured which will engender new reports, applications and ways of doing business. So getting the new data structure right should be your most important goal. Don't trade getting the structure right against against other short term development goals, especially: <ul> <li>Operational goals such as rolling out a new service</li> <li>Report performance (use materialized views, triggers or batch jobs instead) </li> </ul> This structure will change over time so your architecture must allow for frequent additions and infrequent normalizations to it. This means that your data structure and any shared APIs to it (including RESTful services) must be properly versioned. <h3>Why RESTful web services?</h3> You mention that your will "Move all functionality to a RESTful service so no application has direct database access". I need to ask a very important question with respect to the legacy apps: Why is this important and what value has it brought? I ask because: <ul> <li>You lose ACID transactions (each call is a single transaction unless you implement some horrifically complicated WS-* standards)</li> <li>Performance degrades: Direct database connections will be faster (no web server work and translations to do) and have less latency (typically 1ms rather than 50-100ms) which will visibly reduce responsiveness in applications written for direct DB connections</li> <li>The database structure is not abstracted from the RESTful service, because you acknowledge that with the database normalization you have to rewrite the web services and rewrite the applications calling them.</li> </ul> And the other cross-cutting concerns are unchanged: <ul> <li>Manageability: Direct database connections can be monitored and managed with many generic tools here</li> <li>Security: direct connections are more secure than web services that your developers will write, </li> <li>Authorization: The database permission model is very advanced and as fine-grained as you could want</li> <li>Scaleability: The web service is a (only?) direct-connected database application and so scales only as much as the database</li> </ul> You can migrate the database and keep the legacy applications running by maintaining a legacy RESTful API. But what if we can keep the legacy apps without introducing a 'legacy' RESTful service. <h3>Database versioning</h3> Presumably the majority of the 'legacy' applications use SQL to directly access data tables; you may have a number of database views as well. One approach to the data migration is that the new database (with the new normalized structure in a new schema) presents the old structure as views to the legacy applications, typically from a different schema. This is actually quite easy to implement, but solves only reporting and read-only functionality. What about legacy application DML? DML can be solved using <ul> <li>Updatable views for simple transformations</li> <li>Introducing stored procedures where updatable views not possible (eg "CALL insert_emp(?, ?, ?)" rather than "INSERT INTO EMP (col1, col2, col3) VALUES (?, ? ?)". </li> <li>Have a 'legacy' table that synchronizes with the new database with triggers and DB links.</li> </ul> Having a legacy-format table with bi-directional synchronization to the new format table(s) using triggers is a brute-force solution and relatively ugly. You end up with identical data in two different schemas (or databases) and the possibility of data going out-of-sync if the synchronization code has bugs - and then you have the classic issues of the "two master" problem. As such, treat this as a last resort, for example when: <ul> <li>The fundamental structure has changed (for example the changing the cardinality of a relation), or </li> <li>The translation to the legacy format is a complex function (eg if the legacy column is the square of the new-format column value and is set to "4", an updatable view cannot determine if the correct value is +2 or -2).</li> </ul> When such changes are required in your data, there will be some significant change in code and logic somewhere. You could implement in a compatibility layer (advantage: no change to legacy code) or change the legacy app (advantage: data layer is clean). This is a technical decision by the engineering team. Creating a compatibility database of the legacy structure using the approaches outlined above minimize changes to legacy applications (in some cases, the legacy application continues without any code change at all). This greatly reduces development and testing costs (for which there is no net functional gain to the business), and greatly reduces rollout risk. It also allows you to concentrate on the real value to the organisation: <ul> <li>The new database structure</li> <li>New RESTful web services</li> <li>New applications (potentially build using the RESTful web services)</li> </ul> <h3>Positive aspect of web services</h3> Please don't read the above as a diatribe against web services, especially RESTful web services. When used for the right reason, such as for enabling web applications or integration between disparate systems, this is a good architectural solution. However, it might not be the best solution for managing your legacy apps during the data migration.

Data Migration from Legacy Data Structure to New Data Structure

Tags:

database

data-structures

data-migration

legacy-code

Ok So here is the problem we are facing.

Currently:

We have a ton of Legacy Applications that have direct database access
The data structure in the database is not normalized
The current process / structure is used by almost all applications

What we are trying to implement:

Move all functionality to a RESTful service so no application has direct database access
Implement a normalized data structure

The problem we are having is how to implement this migration not only with the Applications but with the Database as well.

Our current solution is to:

Identify all the CRUD functionality and implement this in the new Web Service
Create the new Applications to replace the Legacy Apps
Point the New Applications to the new Web Service ( Still Pointing to the Old Data Structure )
Migrate the data in the databases to the new Structure
Point the New Applications to the new Web Service ( Point to new Data Structure )

But as we are discussing this process we are looking at having to rewrite the New Web Service twice. Once for the Old Data Structure and Once for the New Data Structure, As currently we could not represent the old Data Structure to fit the new Data Structure for the new Web Service.

I wanted to know if anyone has faced any challenges like this and how did you overcome these types of issues/implementation and such.

449

asked Nov 17 '12 20:11

Phill Pafford

2 Answers

EDIT: More explanation of synchronization using bi-directional triggers; updates for syntax, language and clarity.

Preamble

I have faced similar problems on a data model upgrade on a large web application I worked on for 7 years, so I feel your pain. From this experience, I would propose the something a bit different - but hopefully one that will be a lot easier to implement. But first, an observation:

Value to the organisation is the data - data will long outlive all your current applications. The business will constantly invent new ways of getting value out of the data it has captured which will engender new reports, applications and ways of doing business.

So getting the new data structure right should be your most important goal. Don't trade getting the structure right against against other short term development goals, especially:

Operational goals such as rolling out a new service
Report performance (use materialized views, triggers or batch jobs instead)

This structure will change over time so your architecture must allow for frequent additions and infrequent normalizations to it. This means that your data structure and any shared APIs to it (including RESTful services) must be properly versioned.

Why RESTful web services?

You mention that your will "Move all functionality to a RESTful service so no application has direct database access". I need to ask a very important question with respect to the legacy apps: Why is this important and what value has it brought?

I ask because:

You lose ACID transactions (each call is a single transaction unless you implement some horrifically complicated WS-* standards)
Performance degrades: Direct database connections will be faster (no web server work and translations to do) and have less latency (typically 1ms rather than 50-100ms) which will visibly reduce responsiveness in applications written for direct DB connections
The database structure is not abstracted from the RESTful service, because you acknowledge that with the database normalization you have to rewrite the web services and rewrite the applications calling them.

And the other cross-cutting concerns are unchanged:

Manageability: Direct database connections can be monitored and managed with many generic tools here
Security: direct connections are more secure than web services that your developers will write,
Authorization: The database permission model is very advanced and as fine-grained as you could want
Scaleability: The web service is a (only?) direct-connected database application and so scales only as much as the database

You can migrate the database and keep the legacy applications running by maintaining a legacy RESTful API. But what if we can keep the legacy apps without introducing a 'legacy' RESTful service.

Database versioning

Presumably the majority of the 'legacy' applications use SQL to directly access data tables; you may have a number of database views as well.

One approach to the data migration is that the new database (with the new normalized structure in a new schema) presents the old structure as views to the legacy applications, typically from a different schema.

This is actually quite easy to implement, but solves only reporting and read-only functionality. What about legacy application DML? DML can be solved using

Updatable views for simple transformations
Introducing stored procedures where updatable views not possible (eg "CALL insert_emp(?, ?, ?)" rather than "INSERT INTO EMP (col1, col2, col3) VALUES (?, ? ?)".
Have a 'legacy' table that synchronizes with the new database with triggers and DB links.

Having a legacy-format table with bi-directional synchronization to the new format table(s) using triggers is a brute-force solution and relatively ugly.

You end up with identical data in two different schemas (or databases) and the possibility of data going out-of-sync if the synchronization code has bugs - and then you have the classic issues of the "two master" problem. As such, treat this as a last resort, for example when:

The fundamental structure has changed (for example the changing the cardinality of a relation), or
The translation to the legacy format is a complex function (eg if the legacy column is the square of the new-format column value and is set to "4", an updatable view cannot determine if the correct value is +2 or -2).

When such changes are required in your data, there will be some significant change in code and logic somewhere. You could implement in a compatibility layer (advantage: no change to legacy code) or change the legacy app (advantage: data layer is clean). This is a technical decision by the engineering team.

Creating a compatibility database of the legacy structure using the approaches outlined above minimize changes to legacy applications (in some cases, the legacy application continues without any code change at all). This greatly reduces development and testing costs (for which there is no net functional gain to the business), and greatly reduces rollout risk.

It also allows you to concentrate on the real value to the organisation:

The new database structure
New RESTful web services
New applications (potentially build using the RESTful web services)

Positive aspect of web services

Please don't read the above as a diatribe against web services, especially RESTful web services. When used for the right reason, such as for enabling web applications or integration between disparate systems, this is a good architectural solution. However, it might not be the best solution for managing your legacy apps during the data migration.

145

answered Oct 08 '22 22:10

Andrew Alcock

What it seems like you ought to do is define a new data model ("normalized") and build a mapping from the normalized model back to the legacy model. Then you can replace legacy direct calls with calls on the normalized one at your leisure. This breaks no code.

In parallel, you need to define what amounts to a (cerntralized) legacy db api, and map it to to your normalized model. Now, at your leisure, replace the original legacy db calls with calls on the legacy db API. This breaks no code.

Once the original calls are completely replaced, you can switch the data model over to the real normalized one. This should break no code, since everything is now going against the legacy db API or the normalized db API.

Finally, you can replace the legacy db API calls and related code, with revised code that uses the normalized data API. This requires careful recoding.

To speed all this up, you want an automated code transformation tool to implement the code replacements.

This document seems to have a good overview: http://se-pubs.dbs.uni-leipzig.de/files/Cleve2006CotransformationsinDatabaseApplicationsEvolution.pdf

answered Oct 08 '22 20:10

Ira Baxter

Related questions
                            
                                ROW_NUMBER() without over in SQL
                            
                                Adding multiple columns in MySQL with one statement
                            
                                In Laravel, how do I retrieve a random user_id from the Users table for Model Factory seeding data generation?
                            
                                Sorting on the server or on the client?
                            
                                What is the difference between "conflict serializable" and "conflict equivalent"?
                            
                                Mysql user creation script
                            
                                Database on the fly with scripting languages
                            
                                How I can change prefixes in all tables in my MySQL DB?
                            
                                How to get database field type in Laravel?
                            
                                what is the difference between triggers, assertions and checks (in database)
                            
                                Removing all decimals in PHP
                            
                                Why use your application-level cache if database already provides caching?
                            
                                Integer vs String in database
                            
                                The type or namespace name 'SQLConnection' could not be found
                            
                                Monitoring Mongo for changes with Node.js
                            
                                Create a Map in Golang from database Rows
                            
                                best database book for developers [closed]
                            
                                Redis: Database Size to Memory Ratio?
                            
                                Android synching data between users
                            
                                Best way to archive live MySQL database

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With