Keeping partly-offline sqlite db in sync with postgresql

Tags:

This question is more on architecture and libs, than on implementation.

I am currently working at project, which requires a local long-term cache storage (updated once a day) at client kept in sync with a remote db at server. For client side sqlite has been chosen as a lightweight approach and postgresql as feature rich db at server. Native replication mechanisms of postgres are no-opt cause I need to keep client really lightweight and free of relying on external components like db servers.

The implementation language would be Python. Now I'm looking at ORMs like SQLAlchemy, but haven't worked with any before.

Does SQLAlchemy have any tools to keep sqlite and postgres dbs in sync?
If not, are there any other Python libraries which have such tools?
Any ideas about how should the architecture look like, if the task must be solved "by hand"?

Added: It's like telemetry, cause client would have internet connection only for approximately 20 minutes a day

So, the main question is about architecure of such a system

679

asked Aug 15 '16 14:08

thodnev

1 Answers

It doesn't usually fall within the tasks of an ORM to sync data between databases, so you will likely have to implement it yourself. I don't know of any solution that will handle syncing for you given your choice of databases.

There are a couple important design choices to consider:

how do you figure out what data changed ( i.e. inserted, updated or deleted )
what is the most efficient way to package the change-log
will you have to deal with conflicts ? and how will you do that.

The most efficient way to figure out what changed is to have the database tell you that directly. Bottled water can offer some inspiration in this regard. The idea is to tap into the event log postgres would use for replication. You will need something like Kafka to keep track of what each of your clients already knows. This will allow you to optimize your server for writes, as you won't have clients querying trying to figure out what changed since they were last online. The same can be achieved on the sqlight end with event callbacks, you'll just have to trade some storage space on the client to retain the changes to be sent to the server. If that sounds like too much infrastructure for your needs, it's something that you can easily implement with SQL and pooling as well, but I would still think of it as an event log, and consider how it's implemented a detail - possibly allowing for a more efficient implementation lather on.

The best way to structure and package your change log will depend on your applications requirements, available band-with, etc. You could use standard formats such as json, compress and encrypt if needed.

It will be much simpler to design your application as such to avoid conflicts, and possibly flow data in a single direction, or partition your data so that it always flows in a single direction for a specific partition.

One final taught is that with such an architecture you would be getting incremental updates, some of which might be missed for unplanned reasons ( system failure, bugs, dropped messages, etc ). You could have some built in heuristic to check that your data matches, like at least checking the number of records on each side, with some way to recover such a fault, at a minimal a way to manually re-fetch the data from the authoritative source, i.e. if the server is authoritative, the client should be able to discard it's data and re-fetch it. You might need such a mechanism anyway for cases wen the client is reinstalled, etc.

118

answered Oct 18 '22 12:10

Alpar

Related questions
                            
                                python read file non blocking on windows
                            
                                How to fit different inputs into an sklearn Pipeline?
                            
                                python numpy: Change the column type of a numpy matrix
                            
                                Undo L2 Normalization in sklearn python
                            
                                Error solving Matrix equation with numpy
                            
                                batch upload videos to youtube via command line python
                            
                                Python keras how to transform a dense layer into a convolutional layer
                            
                                How to get parameter arguments from a frozen spicy.stats distribution?
                            
                                How to inherit a python generator and overwrite __iter__
                            
                                Why is subprocess.run output different from shell output of same command?
                            
                                pytest -> How to use fixture return value in test method under a class
                            
                                How to search and replace text in an XML file using Python?
                            
                                Add constraints to scipy.optimize.curve_fit?
                            
                                How to read json files in Tensorflow?
                            
                                Comparison of collections containing non-reflexive elements
                            
                                Python itertools with multiprocessing - huge list vs inefficient CPUs usage with iterator
                            
                                Multiple kwargs in a function call?
                            
                                How can I test the standard input and standard output in Python Script with a Unittest test?
                            
                                Concurrency with subprocess module. How can I do this?
                            
                                Anyone successfully bundled data files into a single file with Pyinstaller?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Keeping partly-offline sqlite db in sync with postgresql

Tags:

python

database

architecture

sqlite

sqlalchemy

thodnev

People also ask

1 Answers

Alpar

Recent Activity

Donate For Us