Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

database independent data migration

My goal is to enable schema and data migration for an existing application.

This kind of question seems to have been asked many times, however with different requirements and circumstances as mine, I think.

Since I am inexperienced in this domain, allow me to lay out the architecture of the app and my assumptions first.

Architecture

The app is a multi-user, enterprise desktop application with a backend server that can persist to any major DB (MySql, Postgresql, SQL Server, Oracle DB, etc). It is assumed the DB is on-premise and maintained by our clients.

The tech stack used is a fairly common Hibernate+Spring+RMI/JMS-Combo.

Currently, migrations are done by the server in the following way:

  • On server start it checks for the latest expected schema version
  • If larger than the current version, start migration to next version until current==latest:
    1. Create new database
    2. Load (whole) latest schema (SQL script with a lot of CREATE TABLE ...)
    3. Migrate data (in Java classes using 2 JDBC-Connections to old and new schema)
    4. Load (all) latest constraints (SQL script with a lot of ALTER TABLE ...)

This migration is slow and forward-only. But it is simple. The problem is, that until now the schema scripts and the queries in the data migrations have been using MySQL-syntax and features.

Note that by migrate data I mean: the backend server copies the data from the old schema to the new one, transforming it if necessary. Also, the migration process starts automatically on-premise of our clients. Meaning, we only have control over the JDBC connection, but no direct access to the database nor knowledge about the specific database being used (MySQL, SQL Server,...).

The goal is to either replace or augment this migration scheme with a database independent one.

Assumptions and research

StackOverflow 1 2 3 4 5 6 7: Answers state to use Hibernate's inbuilt feature. However, the docs state that this is not production ready. Also, AFAICT, all answers are concerned with schema migration only.

Liquibase: Uses a custom DSL (in XML/JSON/YAML/etc) to allow for database independent schema migration only.

DBUnit: Uses custom XML-DSL to capture snapshots of states of databases. Can not recreate a snapshot of schema version 1 to version 2.

flyway: In principle same as Liquibase. But is not database independent because SQL-Scripts are used for migrations.

JOOQ: A database independent Query-DSL in Java on top of JDBC. Comparable to Criteria API but without the drawbacks of JPA. Should in principle allow for database independent data migration, however, does not help with schema migration.

JPA-Query languages like HQL, JPQL, Criteria API are not sufficient because

  1. One cannot reference tables not mapped by the entity manager. E.g. join tables, metadata and audit tables.
  2. A copy of all versions of the Entity classes needs to be kept around for the mapping.

Question

I realize, that as this question stands now, it will be dismissed as opinion-based.

However, I am not necessarily looking for specific solutions to this problem ( I doubt there exists a clear solution for such a complex problem space ) but rather to validate my assumptions.

Namely, is it true, that

  • Liquibase and Flyway are mainly concerned with schema migration and data migration is left as an exercise for the reader?
  • in order for Flyway to support multiple, different databases, one needs to duplicate the migrations scripts per database?
  • by and large, the problem of database independent data migration remains unresolved in enterprise Java?

Even if I was to combine Liquibase/Flyway with JOOQ, I do not see how to perform a data migration, because Liquibase/Flyway migrate databases in place. The old database gets destroyed and with it the opportunity to transform the old data to the new schema.

Thanks for your attention!

like image 338
Oliver Jan Krylow Avatar asked Oct 26 '17 14:10

Oliver Jan Krylow


People also ask

What are the 3 main DB migration strategies?

There are three main approaches to database migration: big bang data migration, trickle data migration, and zero downtime migration.

What are the two different approaches to data migration?

Data Migration Main Approaches There are two basic data migration scenarios – a big bang data migration and a trickle data migration.

What is database migration?

Database migration is the process of migrating data from one or more source databases to one or more target databases by using a database migration service. When a migration is finished, the dataset in the source databases resides fully, though possibly restructured, in the target databases.


1 Answers

Let's break it down a little bit. You're right in that this is largely opinion based, but here's what I've noticed in my experiences.

Liquibase and Flyway are mainly concerned with schema migration and data migration is left as an exercise for the reader?

You can do data migration with liquibase and flyway. It's something I've done pretty often. Take the example where I want to split a User table into User and Address tables. I'd write a migration script, which is basically just a sql file, to create the new Address table and the copy all the relevant data into it.

in order for Flyway to support multiple, different databases, one needs to duplicate the migrations scripts per database?

Possibly, flyway and liquibase are better thought of as database versioning tools. If my app needs version 10 of the database, these tools would help me get to that point. Again, the migration scripts are just basic .sql files. If you're using some mysql specific functions then those will just go in the migration script and they wouldn't work on a sql server

by and large, the problem of database independent data migration remains unresolved in enterprise Java?

Eh, I'm not sure about this one. I agree its a problem, but in practice it's not a huge one. For the past 8+ years, I've only written ansi sql. It should be portable everywhere. So in theory, we can lift those applications on to a different database. JPA and the various implementations help with those differences. Depending on how your project was built, say an application that has all of its business logic in implementation specific sql functions, then it's going to be a headache. If you're using the database for CRUD, and I'd argue that's all you should be using it for, then it's not a huge deal.

So all that said, I think you might have the wrong idea about flyway and liquibase. Like i said earlier, they aren't really 'migration tools' so much as they are database versioning tools. With a list of specific sql migration scripts that are ordered, i can guarantee the state of my database at any version. I'm not sure these are tools that I'd use to 'migrate' a legacy SQL Server based application into a PostGres based application.

like image 85
tfecw Avatar answered Oct 16 '22 22:10

tfecw