Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I incremental backup a SQLite database?

Tags:

sqlite

backup

I want want to backup the database at high frequency, but the cost of full backup is not acceptable. It seems SQLite do not have API to do incremental backup directly. But I found there's a Data Change Notification Callbacks which seems helpful. https://www.sqlite.org/c3ref/update_hook.html The parameters of the callbacks are type of operation, name of datebase, name of table and rowID. However, I don't know is it possible to generate backup information (An SQL statement, for example)for this row of date by these information, without knowing the detailed information of table. Which means, is there a common method to generate the backup for this operation for tables with different structure? I know there are some exceptions that the callback won't be called, and I think this is acceptable if I do full backup periodically.

like image 596
iuradz Avatar asked Mar 19 '15 20:03

iuradz


People also ask

How you can take the backup of SQLite database?

Historically, backups (copies) of SQLite databases have been created using the following method: Establish a shared lock on the database file using the SQLite API (i.e. the shell tool). Copy the database file using an external tool (for example the unix 'cp' utility or the DOS 'copy' command).

Does SQL have incremental backup?

SQL does not have incremental backups. SQL does not have incremental backups. Closest to it would be a log backup.

Can SQLite store datasets larger than 1 petabyte?

Very large datasetsAn SQLite database is limited in size to 281 terabytes (248 bytes, 256 tibibytes). And even if it could handle larger databases, SQLite stores the entire database in a single disk file and many filesystems limit the maximum size of files to something less than this.


1 Answers

I’ve been thinking about this idea, it’s not officially endorsed by SQLite but in theory it sounds very plausible.

The SQLite backup api essentially lets you get a working snapshot of the live database file. Similarly the VACCUM INTO let’s you update an existing backup database file. https://www.sqlite.org/lang_vacuum.html#vacuuminto

This is just good ol’ backup, we want an incremental backup (kinda like git)

Suppose we want to backup the database every hour and it’s 1GB database with relatively few writes, storing 24GB per day sounds like an overkill.

We can take advantage of SQLite file format which is essentially a fixed 100 byte header + (page_size * num_pages). SQLite will always write around the page boundaries. The page_size and num_pages are stored in the 100 byte header. See the storage spec https://www.sqlite.org/fileformat.html

So what we can do is create a reference file which is simply a file with a list of hashes. Say we use sha256 (which is what new git version uses) so it will be a file (e.g backups/2020-02-22-19-12-00.txt)

sha256(header)
sha256(page1)
sha256(page2)
sha256(page3)

And we store the corresponding pages as individual files like git does in an objects dir.

E.g objects/ab/cdef12343..

The first two letters are used as directory name so we don’t have too many files in a directory.

Alternatively you can just upload the page files to any of the cloud storage providers e.g GCS, S3, Azure Blobs, DO spaces. This could give multi region backup.

Since we’re not storing duplicate copies of pages, the total file size of all backups is pretty small compared to the database_size * num_backups.

You can even use the hashes file as a way to sync/restore SQLite file. This is how Dropbox/rsync syncs files. The hashes file tells us what pages changed and we only download the changed objects and update those ranges in the file.

like image 60
user3893988 Avatar answered Oct 18 '22 18:10

user3893988