Best practices to structure a database to be scaling-ready

Tags:

I know this is a very generic and subjective question, so feel free to vote to close it if it does not meet the StackOverflow netiquette.. but for me, it's worth trying ;)

I've never built a high-traffic application since now, so I'm not aware (except for some reading on the web) about scaling practices.

How can I design a database that, when a scaling is needed, I dont have to refactor the database structure, or the application code?

I know that development (and optimization) should come step-by-step, optimize bottleneck as they happen, and is nearly impossible to design the perfect structure when you don't know how many users you'll have and how would they use the database (e.g. read/write ratio), I'm just looking for a good base to start.

What are the best practices for making a structure almost ready to be scaled with partitioning and sharding, and what hacks must be absolutely avoided?

Edit some detail about my application:

The application will run as a multisite behavior
I'll have a database for each application version (db_0_0_1, db_0_0_2, etc..)*
Every 'site' will have a schema inside a database* and a role that can access only his own schemas
Application code will be mostly PHP and few things (daemons and maintenance things) in Python
Web server will probably be Nginx and lighttpd or node.js as support for long-polling tasks (e.g. chat)
Caching will be done with memcached (plus apc for things strictly related to the php code, as it can be used outside php)

647

asked Dec 13 '11 10:12

Strae

1 Answers

The question is really generic, but here are few tips:

Do not use any session variables (pg_backend_pid(), inet_client_addr()) or per-session control (SET ROLE, SET SESSION) in application code.
Do not use explicit transaction control (BEGIN/COMMIT/SET TRANSACTION) in application code. All such logic should be wrapped in UDFs. This enables stateless, statement-mode pooling which enables fastest possible DB pooling. (see pgbouncer docs, and pg wiki for more info)
Encapsulate all App<->Db communication in well defined DB API of UDFs - this will let you use PL/Proxy. If doing this with all SELECTs is too hard, do it at least for all data writes (INSERT/UPDATE/DELETE). Example: instead of INSERT INTO users(name) VALUES('Joe') you need SELECT create_user('Joe').
check your DB schema - is it easy to separate all data belonging to given user? (most probably this will be the partitioning key). All that's left is common, shared data which will need to be replicated to all nodes.
think of caching before you need it. what will be caching key? what will be cache timeout? will you use memcached?

196

answered Nov 12 '22 21:11

filiprem

Related questions
                            
                                WebApp Password Management - Hashing, Salting, etc
                            
                                Is there anyway to implement Full Text Search (FTS) in SQlite from Android platform?
                            
                                How to create production database sample for testing?
                            
                                Is there any interests database for download?
                            
                                What is considered a "best practice" for the design of a set of PHP scripts which service AJAX requests?
                            
                                Best way to store user-submitted item names (and their synonyms)
                            
                                Storing books in a database
                            
                                Improving PostgreSQL Aggregate Performance
                            
                                Multiple/Single *.edmx files per database
                            
                                Transactional, in-memory, object/key/value storage library?
                            
                                Database Abstraction Language for Java
                            
                                Replication Modes Definitions?
                            
                                Best way of encrypting text to store in mysql database
                            
                                Database design - how to implement user group table?
                            
                                Android - Update database in app from web
                            
                                Integrating GeoDjango into existing Django project
                            
                                Transactions and Master + Slave Replication
                            
                                About Youtube views count
                            
                                Field 'user_data' doesn't have a default value in CI 2.0.3
                            
                                ORDER BY RAND not working

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Best practices to structure a database to be scaling-ready

Tags:

database

optimization

postgresql

scaling

Strae

People also ask

1 Answers

filiprem

Recent Activity

Donate For Us