Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Keeping partly-offline sqlite db in sync with postgresql

This question is more on architecture and libs, than on implementation.

I am currently working at project, which requires a local long-term cache storage (updated once a day) at client kept in sync with a remote db at server. For client side sqlite has been chosen as a lightweight approach and postgresql as feature rich db at server. Native replication mechanisms of postgres are no-opt cause I need to keep client really lightweight and free of relying on external components like db servers.

The implementation language would be Python. Now I'm looking at ORMs like SQLAlchemy, but haven't worked with any before.

  • Does SQLAlchemy have any tools to keep sqlite and postgres dbs in sync?
  • If not, are there any other Python libraries which have such tools?
  • Any ideas about how should the architecture look like, if the task must be solved "by hand"?

Added: It's like telemetry, cause client would have internet connection only for approximately 20 minutes a day

So, the main question is about architecure of such a system

like image 679
thodnev Avatar asked Aug 15 '16 14:08

thodnev


People also ask

Can SQLite connect to PostgreSQL?

Using CData Sync, you can replicate SQLite data to PostgreSQL. To add a replication destination, navigate to the Connections tab. Click Add Connection. Select PostgreSQL as a destination.

Is SQLite an offline database?

SQLite is an offline database that is locally stored in the user's device and we do not have to create any connection to connect to this database.

Is SQLite slower than Postgres?

7) SQLite vs PostgreSQL: SpeedSQLite is fast, which can be attributed to the fact that it is a lightweight DBMS with simple operations and minimal design. PostgreSQL may not be a suitable DBMS for running fast read queries. This can be attributed to its complex design and the fact that it is a heavy DBMS.

How does SQLite work without server?

SQLite is an open-source software. The software does not require any license after installation. SQLite is serverless as it doesn't need a different server process or system to operate. SQLite facilitates you to work on multiple databases on the same session simultaneously, thus making it flexible.


1 Answers

It doesn't usually fall within the tasks of an ORM to sync data between databases, so you will likely have to implement it yourself. I don't know of any solution that will handle syncing for you given your choice of databases.

There are a couple important design choices to consider:

  • how do you figure out what data changed ( i.e. inserted, updated or deleted )
  • what is the most efficient way to package the change-log
  • will you have to deal with conflicts ? and how will you do that.

The most efficient way to figure out what changed is to have the database tell you that directly. Bottled water can offer some inspiration in this regard. The idea is to tap into the event log postgres would use for replication. You will need something like Kafka to keep track of what each of your clients already knows. This will allow you to optimize your server for writes, as you won't have clients querying trying to figure out what changed since they were last online. The same can be achieved on the sqlight end with event callbacks, you'll just have to trade some storage space on the client to retain the changes to be sent to the server. If that sounds like too much infrastructure for your needs, it's something that you can easily implement with SQL and pooling as well, but I would still think of it as an event log, and consider how it's implemented a detail - possibly allowing for a more efficient implementation lather on.

The best way to structure and package your change log will depend on your applications requirements, available band-with, etc. You could use standard formats such as json, compress and encrypt if needed.

It will be much simpler to design your application as such to avoid conflicts, and possibly flow data in a single direction, or partition your data so that it always flows in a single direction for a specific partition.

One final taught is that with such an architecture you would be getting incremental updates, some of which might be missed for unplanned reasons ( system failure, bugs, dropped messages, etc ). You could have some built in heuristic to check that your data matches, like at least checking the number of records on each side, with some way to recover such a fault, at a minimal a way to manually re-fetch the data from the authoritative source, i.e. if the server is authoritative, the client should be able to discard it's data and re-fetch it. You might need such a mechanism anyway for cases wen the client is reinstalled, etc.

like image 118
Alpar Avatar answered Oct 18 '22 12:10

Alpar