Can you share your thoughts how would you implement data versioning in PostgreSQL. (I've asked similar question regarding Cassandra and MongoDB. If you have any thoughts which db is better for that please share)
Suppose that I need to version records in a simple address book. Address book records are stored in one table without relations for simplicity. I expect that the history:
I'm considering the following approaches:
Create a new object table to store history of records with a copy of schema of addressbook table and add timestamp and foreign key to address book table.
Create a kind of schema less table to store changes to address book records. Such table would consist of: AddressBookId, TimeStamp, FieldName, Value. This way I would store only changes to the records and I wouldn't have to keep history table and address book table in sync.
Create a table to store seralized (JSON) address book records or changes to address book records. Such table would looks as follows: AddressBookId, TimeStamp, Object (varchar). Again this is schema less so I wouldn't have to keep the history table with address book table in sync. (This is modelled after Simple Document Versioning with CouchDB)
If you really have two distinct PostgreSQL databases, the common way of transferring data from one to another would be to export your tables (with pg_dump -t ) to a file, and import them into the other database (with psql ).
In the case of research data, a new version of a dataset may be created when an existing dataset is reprocessed, corrected or appended with additional data. Versioning is one means by which to track changes associated with 'dynamic' data that is not static over time.
Versioning a database means sharing all changes of a database that are neccessary for other team members in order to get the project running properly. Database versioning starts with a settled database schema (skeleton) and optionally with some data.
I do something like your second approach: have the table with the actual working set and a history with changes (timestamp, record_id, property_id, property_value). This includes the creation of records. A third table describes the properties (id, property_name, property_type), which helps in data conversion higher up in the application. So you can also track very easily changes of single properties.
Instead of a timestamp you could also have an int-like, wich you increment for every change per record_id, so you have an actual version.
You could have start_date
and end_date
.
When end_date
is NULL, it`s the actual record.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With