Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

SQLite: what are the practical limits?

Before you mark this question as duplicate, PLEASE HEAR ME OUT!!
I have already read the questions asked here on how to improve performance e.g. just to mention a few Improve INSERT-per-second performance of SQLite? and What are the performance characteristics of sqlite with very large database files?

I am struggling to make sqlite work with database file size of 5 gigabytes. On the contrary there are people out there, who claim that sqlite works 'great' for them even when database size is as big as 160 GB. I have not tried it myself but from the questions asked, I guess that all the bench-marking is perhaps done with only table in the database.

I am using a database with
- 20 or so tables
- Half of the tables have more than 15 columns
- Each of these 15-or-so-column-tables have 6/7 foreign key columns - A few of these table have already grown to have 27 million records with in a month

The development machine that I am using is 3 GHz Quad core machine with 4 gigs of RAM and yet it takes more than 3 minutes just to query the row_count in these big tables.

I couldn't find any way to partition the data horizontally. The best shot that I have is to split the data across multiple database files one for each table. But in that case, as far as I know, the foreign key column constraints can't be employed, so I will have to create a self sufficient table (without any foreign keys).

So my questions are
a) Am I using the wrong database for the job?
b) What do you think where am I going wrong?
c) I have not added indexes on foreign keys yet but if just row count query takes four minutes how are indexes of foreign keys going to me help me?

EDIT To provide more information even though no one has asked for it :) I am using SQLite version 3.7.9 with system.data.sqlite.dll version 1.0.77.0

EDIT2: I THINK where I am going different from the 160 gig guys is that they may select an individual record or a small range of records. But I have to load all of the 27 millions rows in my table, join them against another tables, group the records as asked by the User and return the results. Any input on whats the best way to optimize the database for such results.

I can't cache the results of a previous query as it does not make sense in my case. The chances of hitting the cache will are fairly low.

like image 554
WPFAbsoluteNewBie Avatar asked Mar 12 '12 13:03

WPFAbsoluteNewBie


2 Answers

There's a lot to consider here, but my first bit of advice would be not to take other's performance statistics at face value. Database performance is dependent on a lot of things, including how your database is structured, the complexity of your queries, which indexes you have defined (or not), and often just the sheer amount of data in them. A lot of reported performance numbers comes from a lot of trial and error, and/or matching the database to the job at hand. To say it another way, the performance you're going to get from any DBMS can't plainly be compared to another application's performance unless your datasets and structures are damn near identical - they are certainly a guide, and perhaps an ideal to strive for, but you're not necessarily going to get insane performance "out of the box."

I would, as a starting point, start indexing the data on those really large tables (looks, from the comments, that you've got that), and see what happens. Granted, the count taking four minutes is a pretty long time, but don't stop there. Add some indexes, change them around, ask if you're storing data that you don't need to store, and look at other database queries, not just the count query, to judge performance. Look for other apps and blog posts that use SQLite for large numbers of rows, and see what they've done to address it (which may include changing databases). Basically, try stuff - and then make a judgement. Don't let the initial fear stop you, thinking that you're going down the wrong path. Maybe you are, maybe you aren't, but don't just stop with the COUNT query. Any way you slice it, 27 million records in a table is a crap ton.

Finally, one specific piece of advice is this: in SQLite, don't split up the database into multiple files - I don't see that helping, because then you're going to have to do a lot of extra query work, and then manually join your separate tables after the results return from multiple queries. This is reinventing what the RDBMS does for you, and it a crazy idea. You're not going to somehow figure out a way to do joins faster than the creators of the RDBMS system - you'd definitely be wasting time there.

like image 171
jefflunt Avatar answered Sep 17 '22 17:09

jefflunt


select count(*) in SQLite will always be slower when comparing to other DMBS, because it does a table scan for that particular request. It doesn't have a statistic table to help out. That doesn't mean your application queries will be slow. You need to test your queries to really tell what you can expect.

Some general guidelines: Indexing is an absolute must, because navigating a subset of data in a binary tree is a lot faster than traversing an entire table when enormous size is involved. To help load time, you should sort your data for an unique index, and if you don't have an unique index then the largest index. If you can drop the indices before loading and put it back after, it will be faster. If these techniques couldn't meet your operating and SLA parameters, then it's time to do horizontal partitioning, and use "attach" to span across the data range you need. SQLite can support up to 10 attaches. I know some say partitioning is the job of the tool, not developers, but when you are facing physical limits you have to roll up your sleeves or perhaps choose a commercial tool that's doing it under the cover for you.

like image 45
Paul Louie Avatar answered Sep 18 '22 17:09

Paul Louie