When to use horizontal partitioning and when to use database sharding?

Tags:

I'm reading this article on Wikipedia: http://en.wikipedia.org/wiki/Shard_(database_architecture) trying to find the major difference between these 2 techniques. Here is what I found:

Horizontal partitioning splits one or more tables by row, usually within a single instance of a schema and a database server. It may offer an advantage by reducing index size (and thus search effort) provided that there is some obvious, robust, implicit way to identify in which table a particular row will be found, without first needing to search the index, e.g., the classic example of the 'CustomersEast' and 'CustomersWest' tables, where their zip code already indicates where they will be found.

Sharding goes beyond this: it partitions the problematic table(s) in the same way, but it does this across potentially multiple instances of the schema. The obvious advantage would be that search load for the large partitioned table can now be split across multiple servers (logical or physical), not just multiple indexes on the same logical server.

As I understood, horizontal partitioning is more applicable for single instance (single node environment) whereas sharding is used in multi-node / multiple data-center environment. Is this correct? Or are there different usage scenarios ?

Extra question: For a huge table (with millions of rows) with simple schema (about 4-5 columns) , what is the best technique to improve read/write performance on this table ?

283

asked Jun 01 '14 08:06

Dio Phung

1 Answers

You are correct, horizontal partition (supported for example in MySQL and PostgreSQL) splits a table up within a single server. This can improve performance because data and indexes can be split across many disk volumes, improving I/O. This is usually done with a key range.

With database sharding, you are dividing the data across multiple servers, not just within a single server. In this case you use a shard key to partition the data, typically with some sort of hashing algorithm. You can get a white paper on this subject here (offered by our company, it is not specific to any product, it explains the technology): http://www.codefutures.com/database-sharding-white-paper/

The advantage of DBMS single server partitioning is that it is relatively simple to set up and manage. The disadvantage is ultimately you are limited by what a single server can do. This is particularly the case when it comes to heavy write contention, database locking and heavy queries.

Database Sharding takes more work, but has the advantage of being a shared-nothing approach, thus it is fully scalable.

A clear indicator that database sharding is needed is when a single server cannot keep up with write volume. If you have many heavy queries, this also can drive the need for this type of solution.

Having said all of this, if you are talking about "millions" of rows with 4 - 5 columns, and your reads can be well-indexed for fast access, it is doubtful if you need to implement either of these options. When you are talking 100s of millions or billions of rows, with 1000s of users, that is where database scalability is critical.

There is also an informational Web site I am working on regarding database scalability: www.bigdatascalability.com. It includes links to various articles, and will have new content added over time.

answered Oct 09 '22 08:10

dbschwartz

Related questions
                            
                                Is there a reason to use Base2 lengths for columns in SQL Server? [duplicate]
                            
                                Normalization question regarding address,city,country data
                            
                                Database to track user notifications or activity (Facebook like)
                            
                                one database per client or all clients in one database. which one should I use for a online application?
                            
                                6NF and historical attribute data
                            
                                How to secure Redis Cluster?
                            
                                Django Circular Model Dependency
                            
                                Oracle preferred columns lengths
                            
                                Improve PostgreSQL query performance
                            
                                How to structure a database with multiple join tables
                            
                                Tag-based search model in Mongodb
                            
                                Designing an 'Order' schema in which there are disparate product definition tables
                            
                                How do you put an large existing database (schema) under source control?
                            
                                Opinions on sensor / reading / alert database design
                            
                                How to store datetimes in UTC and local timezone
                            
                                How can a tree be stored in a database?
                            
                                NoSQL or RDBMS for audit data
                            
                                Nesting structures With Datomic
                            
                                Use case for hstore against multiple columns
                            
                                Is it possible to have an dynamic foreign key, and what is the best/correct to do so?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

When to use horizontal partitioning and when to use database sharding?

Tags:

database-design

database-performance

sharding

database-partitioning

Dio Phung

People also ask

1 Answers

dbschwartz

Recent Activity

Donate For Us