2020 update, about 11 years after the question was posted and later closed, preventing newer answers.
Almost everything written here is obsolete. Once upon a time sqlite was limited to the memory capacity or to 2 GB of storage (32 bits) or other popular numbers... well, that was a long time ago.
Official limitations are listed here. Practically sqlite is likely to work as long as there is storage available. It works well with dataset larger than memory, it was originally created when memory was thin and it was a very important point from the start.
There is absolutely no issue with storing 100 GB of data. It could probably store a TB just fine but eventually that's the point where you need to question whether SQLite is the best tool for the job and you probably want features from a full fledged database (remote clients, concurrent writes, read-only replicas, sharding, etc...).
Original:
I know that sqlite doesn't perform well with extremely large database files even when they are supported (there used to be a comment on the sqlite website stating that if you need file sizes above 1GB you may want to consider using an enterprise rdbms. Can't find it anymore, might be related to an older version of sqlite).
However, for my purposes I'd like to get an idea of how bad it really is before I consider other solutions.
I'm talking about sqlite data files in the multi-gigabyte range, from 2GB onwards. Anyone have any experience with this? Any tips/ideas?
SQLite supports databases up to 281 terabytes in size, assuming you can find a disk drive and filesystem that will support 281-terabyte files. Even so, when the size of the content looks like it might creep into the terabyte range, it would be good to consider a centralized client/server database.
An unlikely requirement for an engine popular on Android and iOS. SQLite, which claims to be "used more than all other database engines combined", has been updated to version 3.33. 0 with the maximum size increased to 281TB, around twice the previous capacity of 140TB.
SQLite 2.7. 6 is often faster (sometimes more than twice as fast) than MySQL 3.23. 41 for most common operations. SQLite does not execute CREATE INDEX or DROP TABLE as fast as the other databases.
The SQLite docs explains why this is so slow: Transaction speed is limited by disk drive speed because (by default) SQLite actually waits until the data really is safely stored on the disk surface before the transaction is complete. That way, if you suddenly lose power or if your OS crashes, your data is still safe.
So I did some tests with sqlite for very large files, and came to some conclusions (at least for my specific application).
The tests involve a single sqlite file with either a single table, or multiple tables. Each table had about 8 columns, almost all integers, and 4 indices.
The idea was to insert enough data until sqlite files were about 50GB.
Single Table
I tried to insert multiple rows into a sqlite file with just one table. When the file was about 7GB (sorry I can't be specific about row counts) insertions were taking far too long. I had estimated that my test to insert all my data would take 24 hours or so, but it did not complete even after 48 hours.
This leads me to conclude that a single, very large sqlite table will have issues with insertions, and probably other operations as well.
I guess this is no surprise, as the table gets larger, inserting and updating all the indices take longer.
Multiple Tables
I then tried splitting the data by time over several tables, one table per day. The data for the original 1 table was split to ~700 tables.
This setup had no problems with the insertion, it did not take longer as time progressed, since a new table was created for every day.
Vacuum Issues
As pointed out by i_like_caffeine, the VACUUM command is a problem the larger the sqlite file is. As more inserts/deletes are done, the fragmentation of the file on disk will get worse, so the goal is to periodically VACUUM to optimize the file and recover file space.
However, as pointed out by documentation, a full copy of the database is made to do a vacuum, taking a very long time to complete. So, the smaller the database, the faster this operation will finish.
Conclusions
For my specific application, I'll probably be splitting out data over several db files, one per day, to get the best of both vacuum performance and insertion/delete speed.
This complicates queries, but for me, it's a worthwhile tradeoff to be able to index this much data. An additional advantage is that I can just delete a whole db file to drop a day's worth of data (a common operation for my application).
I'd probably have to monitor table size per file as well to see when the speed will become a problem.
It's too bad that there doesn't seem to be an incremental vacuum method other than auto vacuum. I can't use it because my goal for vacuum is to defragment the file (file space isn't a big deal), which auto vacuum does not do. In fact, documentation states it may make fragmentation worse, so I have to resort to periodically doing a full vacuum on the file.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With