Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Extreme Sharding: One SQLite Database Per User

I'm working on a web app that is somewhere between an email service and a social network. I feel it has the potential to grow really big in the future, so I'm concerned about scalability.

Instead of using one centralized MySQL/InnoDB database and then partitioning it when that time comes, I've decided to create a separate SQLite database for each active user: one active user per 'shard'.

That way backing up the database would be as easy as copying each user's small database file to a remote location once a day.

Scaling up will be as easy as adding extra hard disks to store the new files.

When the app grows beyond a single server I can link the servers together at the filesystem level using GlusterFS and run the app unchanged, or rig up a simple SQLite proxy system that will allow each server to manipulate sqlite files in adjacent servers.

Concurrency issues will be minimal because each HTTP request will only touch one or two database files at a time, out of thousands, and SQLite only blocks on reads anyway.

I'm betting that this approach will allow my app to scale gracefully and support lots of cool and unique features. Am I betting wrong? Am I missing anything?

UPDATE I decided to go with a less extreme solution, which is working fine so far. I'm using a fixed number of shards - 256 sqlite databases, to be precise. Each user is assigned and bound to a random shard by a simple hash function.

Most features of my app require access to just one or two shards per request, but there is one in particular that requires the execution of a simple query on 10 to 100 different shards out of 256, depending on the user. Tests indicate it would take about 0.02 seconds, or less, if all the data is cached in RAM. I think I can live with that!

UPDATE 2.0 I ported the app to MySQL/InnoDB and was able to get about the same performance for regular requests, but for that one request that requires shard walking, innodb is 4-5 times faster. For this reason, and other reason, I'm dropping this architecture, but I hope someone somewhere finds a use for it...thanks.

like image 748
Seun Osewa Avatar asked Sep 24 '08 18:09

Seun Osewa


People also ask

Can SQLite be used by multiple users?

Yes SQLite can support multiple users at once. It does however lock the whole database when writing, so if you have lots of concurrent writes it is not the database you want (usually the time the database is locked is a few milliseconds - so for most uses this does not matter).

How much data is too much for SQLite?

SQLite database files have a maximum size of about 140 TB. On a phone, the size of the storage (a few GB) will limit your database file size, while the memory size will limit how much data you can retrieve from a query. Furthermore, Android cursors have a limit of 1 MB for the results.

How many concurrent connections can SQLite handle?

The default limit is 1,024.

Is SQLite good for large scale?

SQLite supports databases up to 281 terabytes in size, assuming you can find a disk drive and filesystem that will support 281-terabyte files. Even so, when the size of the content looks like it might creep into the terabyte range, it would be good to consider a centralized client/server database.


2 Answers

The place where this will fail is if you have to do what's called "shard walking" - which is finding out all the data across a bunch of different users. That particular kind of "query" will have to be done programmatically, asking each of the SQLite databases in turn - and will very likely be the slowest aspect of your site. It's a common issue in any system where data has been "sharded" into separate databases.

If all the of the data is self-contained to the user, then this should scale pretty well - the key to making this an effective design is to know how the data is likely going to be used and if data from one person will be interacting with data from another (in your context).

You may also need to watch out for file system resources - SQLite is great, awesome, fast, etc - but you do get some caching and writing benefits when using a "standard database" (i.e. MySQL, PostgreSQL, etc) because of how they're designed. In your proposed design, you'll be missing out on some of that.

like image 191
heckj Avatar answered Nov 09 '22 04:11

heckj


Sounds to me like a maintenance nightmare. What happens when the schema changes on all those DBs?

like image 27
John Sheehan Avatar answered Nov 09 '22 04:11

John Sheehan