Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Can SQLite handle 90 million records?

Tags:

sql

sqlite

Or should I use a different hammer to fix this problem.

I've got a very simple use-case for storing data, effectively a sparse matrix, which I've attempted to store in a SQLite database. I've created a table:

create TABLE data ( id1 INTEGER KEY, timet INTEGER KEY, value REAL ) 

into which I insert a lot of data, (800 elements every 10 minutes, 45 times a day), most days of the year. The tuple of (id1,timet) will always be unique.

The timet value is seconds since the epoch, and will always be increasing. The id1 is, for all practical purposes, a random integer. There is probably only 20000 unique ids though.

I'd then like to access all values where id1==someid or access all elements where timet==sometime. On my tests using the latest SQLite via the C interface on Linux, a lookup for one of these (or any variant of this lookup) takes approximately 30 seconds, which is not fast enough for my use case.

I tried defining an index for the database, but this slowed down insertion to completely unworkable speeds (I might have done this incorrectly though...)

The table above leads to very slow access for any data. My question is:

  • Is SQLite completely the wrong tool for this?
  • Can I define indices to speed things up significantly?
  • Should I be using something like HDF5 instead of SQL for this?

Please excuse my very basic understanding of SQL!

Thanks

I include a code sample that shows how the insertion speed slows to a crawl when using indices. With the 'create index' statements in place, the code takes 19 minutes to complete. Without that, it runs in 18 seconds.


#include <iostream> #include <sqlite3.h>  void checkdbres( int res, int expected, const std::string msg )  {   if (res != expected) { std::cerr << msg << std::endl; exit(1); }  }  int main(int argc, char **argv) {   const size_t nRecords = 800*45*30;    sqlite3      *dbhandle = NULL;   sqlite3_stmt *pStmt = NULL;   char statement[512];    checkdbres( sqlite3_open("/tmp/junk.db", &dbhandle ), SQLITE_OK, "Failed to open db");    checkdbres( sqlite3_prepare_v2( dbhandle, "create table if not exists data ( issueid INTEGER KEY, time INTEGER KEY, value REAL);", -1, & pStmt, NULL ), SQLITE_OK, "Failed to build create statement");   checkdbres( sqlite3_step( pStmt ), SQLITE_DONE, "Failed to execute insert statement" );   checkdbres( sqlite3_finalize( pStmt ), SQLITE_OK, "Failed to finalize insert");   checkdbres( sqlite3_prepare_v2( dbhandle, "create index issueidindex on data (issueid );", -1, & pStmt, NULL ), SQLITE_OK, "Failed to build create statement");   checkdbres( sqlite3_step( pStmt ), SQLITE_DONE, "Failed to execute insert statement" );   checkdbres( sqlite3_finalize( pStmt ), SQLITE_OK, "Failed to finalize insert");   checkdbres( sqlite3_prepare_v2( dbhandle, "create index timeindex on data (time);", -1, & pStmt, NULL ), SQLITE_OK, "Failed to build create statement");   checkdbres( sqlite3_step( pStmt ), SQLITE_DONE, "Failed to execute insert statement" );   checkdbres( sqlite3_finalize( pStmt ), SQLITE_OK, "Failed to finalize insert");    for ( size_t idx=0; idx < nRecords; ++idx)   {     if (idx%800==0)     {       checkdbres( sqlite3_prepare_v2( dbhandle, "BEGIN TRANSACTION", -1, & pStmt, NULL ), SQLITE_OK, "Failed to begin transaction");       checkdbres( sqlite3_step( pStmt ), SQLITE_DONE, "Failed to execute begin transaction" );       checkdbres( sqlite3_finalize( pStmt ), SQLITE_OK, "Failed to finalize begin transaction");       std::cout << "idx " << idx << " of " << nRecords << std::endl;     }      const size_t time = idx/800;     const size_t issueid = idx % 800;     const float value = static_cast<float>(rand()) / RAND_MAX;     sprintf( statement, "insert into data values (%d,%d,%f);", issueid, (int)time, value );     checkdbres( sqlite3_prepare_v2( dbhandle, statement, -1, &pStmt, NULL ), SQLITE_OK, "Failed to build statement");     checkdbres( sqlite3_step( pStmt ), SQLITE_DONE, "Failed to execute insert statement" );     checkdbres( sqlite3_finalize( pStmt ), SQLITE_OK, "Failed to finalize insert");      if (idx%800==799)     {       checkdbres( sqlite3_prepare_v2( dbhandle, "END TRANSACTION", -1, & pStmt, NULL ), SQLITE_OK, "Failed to end transaction");       checkdbres( sqlite3_step( pStmt ), SQLITE_DONE, "Failed to execute end transaction" );       checkdbres( sqlite3_finalize( pStmt ), SQLITE_OK, "Failed to finalize end transaction");     }   }    checkdbres( sqlite3_close( dbhandle ), SQLITE_OK, "Failed to close db" );  } 

like image 270
Brian O'Kennedy Avatar asked Jul 01 '10 19:07

Brian O'Kennedy


People also ask

How many records can SQLite handle?

The theoretical maximum number of rows in a table is 264 (18446744073709551616 or about 1.8e+19). This limit is unreachable since the maximum database size of 281 terabytes will be reached first.

Is SQLite good for large data?

Practically sqlite is likely to work as long as there is storage available. It works well with dataset larger than memory, it was originally created when memory was thin and it was a very important point from the start. There is absolutely no issue with storing 100 GB of data.

Why is SQLite not good?

Disadvantages of SQLiteBecause SQLite reads and writes directly to an ordinary disk file, the only applicable access permissions are the typical access permissions of the underlying operating system. This makes SQLite a poor choice for applications that require multiple users with special access permissions.


1 Answers

Are you inserting all of the 800 elements at once? If you are, doing the inserts within a transaction will speed up the process dramatically.

See http://www.sqlite.org/faq.html#q19

SQLite can handle very large databases. See http://www.sqlite.org/limits.html

like image 102
Robert Harvey Avatar answered Sep 22 '22 15:09

Robert Harvey