Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Best practices to structure a database to be scaling-ready

I know this is a very generic and subjective question, so feel free to vote to close it if it does not meet the StackOverflow netiquette.. but for me, it's worth trying ;)

I've never built a high-traffic application since now, so I'm not aware (except for some reading on the web) about scaling practices.

How can I design a database that, when a scaling is needed, I dont have to refactor the database structure, or the application code?

I know that development (and optimization) should come step-by-step, optimize bottleneck as they happen, and is nearly impossible to design the perfect structure when you don't know how many users you'll have and how would they use the database (e.g. read/write ratio), I'm just looking for a good base to start.

What are the best practices for making a structure almost ready to be scaled with partitioning and sharding, and what hacks must be absolutely avoided?

Edit some detail about my application:

  1. The application will run as a multisite behavior
  2. I'll have a database for each application version (db_0_0_1, db_0_0_2, etc..)*
  3. Every 'site' will have a schema inside a database* and a role that can access only his own schemas
  4. Application code will be mostly PHP and few things (daemons and maintenance things) in Python
  5. Web server will probably be Nginx and lighttpd or node.js as support for long-polling tasks (e.g. chat)
  6. Caching will be done with memcached (plus apc for things strictly related to the php code, as it can be used outside php)
like image 647
Strae Avatar asked Dec 13 '11 10:12

Strae


People also ask

How can database scalability be improved?

Database Limitations These types of N+1 queries can serverely impact performance. The best solution to most database scalability issues is optimizing SQL queries and implementing indexing strategies. By building articles and authors into a single query, you can dramatically reduce the volume of queries you're running.

Which database is the best fit for scalable design?

Easier to scale: NoSQL databases are easier to scale. They're designed to be fragmented across multiple data centers without much difficulty. No data preparation required: When there isn't time to design a complex model, and you need to get a database running fast, non-relational databases save a lot of time.


1 Answers

The question is really generic, but here are few tips:

  • Do not use any session variables (pg_backend_pid(), inet_client_addr()) or per-session control (SET ROLE, SET SESSION) in application code.

  • Do not use explicit transaction control (BEGIN/COMMIT/SET TRANSACTION) in application code. All such logic should be wrapped in UDFs. This enables stateless, statement-mode pooling which enables fastest possible DB pooling. (see pgbouncer docs, and pg wiki for more info)

  • Encapsulate all App<->Db communication in well defined DB API of UDFs - this will let you use PL/Proxy. If doing this with all SELECTs is too hard, do it at least for all data writes (INSERT/UPDATE/DELETE). Example: instead of INSERT INTO users(name) VALUES('Joe') you need SELECT create_user('Joe').

  • check your DB schema - is it easy to separate all data belonging to given user? (most probably this will be the partitioning key). All that's left is common, shared data which will need to be replicated to all nodes.

  • think of caching before you need it. what will be caching key? what will be cache timeout? will you use memcached?

like image 196
filiprem Avatar answered Nov 12 '22 21:11

filiprem