What is a difference between table distribution and table partition in sql?

1 Answers

Azure DW has up to 60 computing nodes as part of it's MPP architecture. When you store a table on Azure DW you are storing it amongst those nodes. Your tables data is distributed across these nodes (using Hash distribution or Round Robin distribution depending on your needs). You can also choose to have your table (preferably a very small table) replicated across these nodes.

enter image description here

That is distribution. Each node has its own distinct records that only that node worries about when interacting with the data. It's a shared-nothing architecture.

enter image description here

Partitioning is completely divorced from this concept of distribution. When we partition a table we decide which rows belong into which partitions based on some scheme (like partitioning an order table by the order.create_date for instance). A chunk of records for each create_date then gets stored in its own table separate from any other create_date set of records (invisibly behind the scenes).

Partitioning is nice because you may find that you only want to select 10 days worth of orders from your table, so you only need to read against 10 smaller tables, instead of having to scan across years of order data to find the 10 days you are after.

Here's an example from the Microsoft website where horizontal partitioning is done on the name column with two "shards" based on the names alphabetical order:

enter image description here

Table distribution is a concept that is only available on MPP type RDBMSs like Azure DW or Teradata. It's easiest to think of it as a hardware concept that is somewhat divorced (to a degree) from the data. Azure gives you a lot of control here where other MPP databases base distribution on primary keys. Partitioning is available on nearly every RDBMS (MPP or not) and it's easiest to think of it as a storage/software concept that is defined by and dependent on the data in the table.

In the end, they do both work to solve the same problem. But... nearly every RDBMS concept (indexing, disk storage, optimization, partition, distribution, etc) are there to solve the same problem. Namely: "How do I get the exact data I need out as quickly as possible?" When you combine these concepts together to match your data retrieval needs you make your SQL requests CRAZY fast even against monstrously huge data.

176

answered Sep 30 '22 04:09

JNevill

Related questions
                            
                                How to display progress bar while executing big SQLCommand VB.Net
                            
                                SQL Server Rounding Issue where there is 5
                            
                                Oracle SQL Updating a NOT NULL column with an empty string
                            
                                Boolean giving invalid datatype - Oracle
                            
                                SQL Inner join with function returning table
                            
                                MySQL: Look for the same string in multiple columns
                            
                                Oracle Assignment vs Select Into
                            
                                how to get automatically previous month date range in SQL?
                            
                                Oracle SQL get the first and last records from an ordered dataset
                            
                                PostgreSQL: How to add a column in every table of a database?
                            
                                SQL date format conversion to MMDDYYYY
                            
                                Subtracting 30 Years from Current Date in Oracle SQL
                            
                                SQL - NOW() + 10 days
                            
                                Create Query to join 2 tables 1 on 1 with nothing in common
                            
                                ST_Distance_Sphere in mysql not giving accurate distance between two locations
                            
                                How to retrieve only integer values from SQL Server [duplicate]
                            
                                Trigger to prevent Any Deleting from Table
                            
                                How to convert nvarchar to decimal in SQL
                            
                                Query error: Failed to recognize predicate 'group'. Failed rule: 'identifier' in subquery source
                            
                                BigQuery modulo operator (%) does not work in WHERE clause

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

What is a difference between table distribution and table partition in sql?

Tags:

sql

database

azure-sql-database

partitioning

azure-sqldw

Amit Soni

People also ask

1 Answers

JNevill

Recent Activity

Donate For Us