I know this is a very generic and subjective question, so feel free to vote to close it if it does not meet the StackOverflow netiquette.. but for me, it's worth trying ;)
I've never built a high-traffic application since now, so I'm not aware (except for some reading on the web) about scaling practices.
How can I design a database that, when a scaling is needed, I dont have to refactor the database structure, or the application code?
I know that development (and optimization) should come step-by-step, optimize bottleneck as they happen, and is nearly impossible to design the perfect structure when you don't know how many users you'll have and how would they use the database (e.g. read/write ratio), I'm just looking for a good base to start.
What are the best practices for making a structure almost ready to be scaled with partitioning
and sharding
, and what hacks
must be absolutely avoided?
Edit some detail about my application:
Database Limitations These types of N+1 queries can serverely impact performance. The best solution to most database scalability issues is optimizing SQL queries and implementing indexing strategies. By building articles and authors into a single query, you can dramatically reduce the volume of queries you're running.
Easier to scale: NoSQL databases are easier to scale. They're designed to be fragmented across multiple data centers without much difficulty. No data preparation required: When there isn't time to design a complex model, and you need to get a database running fast, non-relational databases save a lot of time.
The question is really generic, but here are few tips:
Do not use any session variables (pg_backend_pid(), inet_client_addr()) or per-session control (SET ROLE, SET SESSION) in application code.
Do not use explicit transaction control (BEGIN/COMMIT/SET TRANSACTION) in application code. All such logic should be wrapped in UDFs. This enables stateless, statement-mode pooling which enables fastest possible DB pooling. (see pgbouncer docs, and pg wiki for more info)
Encapsulate all App<->Db communication in well defined DB API of UDFs - this will let you use PL/Proxy. If doing this with all SELECTs is too hard, do it at least for all data writes (INSERT/UPDATE/DELETE). Example: instead of INSERT INTO users(name) VALUES('Joe')
you need SELECT create_user('Joe')
.
check your DB schema - is it easy to separate all data belonging to given user? (most probably this will be the partitioning key). All that's left is common, shared data which will need to be replicated to all nodes.
think of caching before you need it. what will be caching key? what will be cache timeout? will you use memcached?
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With