Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

sqlite or mysql for large datasets

Tags:

sqlite

mysql

r

I am working with large datasets (10s of millions of records, at times, 100s of millions), and want to use a database program that links well with R. I am trying to decide between mysql and sqlite. The data is static, but there are lot of queries that I need to do.

In this link to sqlite help, it states that:

"With the default page size of 1024 bytes, an SQLite database is limited in size to 2 terabytes (241 bytes). And even if it could handle larger databases, SQLite stores the entire database in a single disk file and many filesystems limit the maximum size of files to something less than this. So if you are contemplating databases of this magnitude, you would do well to consider using a client/server database engine that spreads its content across multiple disk files, and perhaps across multiple volumes."

I'm not sure what this means. When I have experimented with mysql and sqlite, it seems that mysql is faster, but I haven't constructed very rigorous speed tests. I'm wondering if mysql is a better choice for me than sqlite due to the size of my dataset. The description above seems to suggest that this might be the case, but my data is no where near 2TB.

I'd appreciate any insights into understanding this constraint of maximum file size from the filesystem and how this could affect speed for indexing tables and running queries. This could really help me in my decision of which database to use for my analysis.

like image 317
exl Avatar asked Jun 11 '11 14:06

exl


People also ask

Is SQLite good for large data?

SQLite supports databases up to 281 terabytes in size, assuming you can find a disk drive and filesystem that will support 281-terabyte files. Even so, when the size of the content looks like it might creep into the terabyte range, it would be good to consider a centralized client/server database.

Which type of database is best for very large data sets?

NoSQL allows for high-performance, agile processing of information at massive scale. It stores unstructured data across multiple processing nodes, as well as across multiple servers. As such, the NoSQL distributed database infrastructure has been the solution of choice for some of the largest data warehouses.

How big is too big for SQLite?

The maximum size of a database file is 4294967294 pages. At the maximum page size of 65536 bytes, this translates into a maximum database size of approximately 1.4e+14 bytes (281 terabytes, or 256 tebibytes, or 281474 gigabytes or 256,000 gibibytes).

Is MySQL suitable for big data?

MySQL was not designed for running complicated queries against massive data volumes which requires crunching through a lot of data on a huge scale. MySQL optimizer is quite limited, executing a single query at a time using a single thread.


1 Answers

The SQLite database engine stores the entire database into a single file. This may not be very efficient for incredibly large files (SQLite's limit is 2TB, as you've found in the help). In addition, SQLite is limited to one user at a time. If your application is web based or might end up being multi-threaded (like an AsyncTask on Android), mysql is probably the way to go.

Personally, since you've done tests and mysql is faster, I'd just go with mysql. It will be more scalable going into the future and will allow you to do more.

like image 174
SubmittedDenied Avatar answered Oct 12 '22 23:10

SubmittedDenied