Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Partition or Index large table in SQL Server

I have a large table consisting of 4 Billion+ rows and 50 columns, most of which are either datetime or numeric except a few which are varchar.

Data will be inserted into the table on a weekly basis (about 20 million rows).

I expect queries with where clauses on some of the datetime columns, and a couple of the the varchar columns. There is no primary key in the table.

There are no indexes, nor the table is partitioned. I am using SQL Server 2016.

I understand that I need to partition or index the table, but I am not sure which approach to take or both in-fact.

Since the table is large, should I create the indexes first or should I create the partitions first? If I do create the indexes and then create the partitions, what should I do to maintain these with new data coming in weekly.

EDIT: Also, minimal updates and deletes are expected on the table

like image 717
siddharth Avatar asked May 09 '19 09:05

siddharth


People also ask

Is index or partition better?

Your results are the same, but you see them more quickly. Index help run query faster its a physical strucutre. There can be a Clustred (usually applied through primary key ) and Non Clustred Index , while partition is a method of breaking a large table into smaller tables. Both patition and Index make things faster.

Does partitioning a table improve performance?

With table partitioning, you can either choose to re-index the table data in its entirety or selectively. Doing so selectively will drastically reduce the time to execute this daily action. For some clients, I have seen between 50 and 75% improvement in performance.

What is the difference between indexing and partitioning?

Indexes are used to speed the search of data within tables. Partitions provide segregation of the data at the hdfs level, creating sub-directories for each partition. Partitioning allows the number of files read and amount of data searched in a query to be limited.


1 Answers

I understand that I need to partition or index the table

You need to understand what you gain from partitioning. It is not at all the case that SQL Server requires partitioning on big tables to function adequately. SQL Server scales to arbitrary tables sizes without any inherent issues.

Common benefits of partitioning are:

  1. Mass deletion in constant time
  2. Different storage for older partitions
  3. Not backing up old partitions

Sometimes in special situations (e.g. columnstore), partitioning can help as a strategy to speed up queries. Normally, indexing is better for that.

Essentially, partitioning splits the table physically into multiple sub tables. Most often this has a negative effect on query plans. Indexes are perfectly capable of restricting the set of data that needs to be touched. Partitions are worse for that.

Most of the queries will be filtering on the datetime columns and on some of the varchar columns. Like, get data for a certain daterange for a certain entity. With the indexes, it will be fragmented a lot because of new inserts and rebuilding/reorganising the indexes will also consume a lot of time. I can do it but again not sure which approach.

It seems you can best solve this by indexing:

  1. Index according to the queries you expect.
  2. Maintain the indexes properly. This is not too hard. For example, rebuild them after the weekly load.

Since the table is large, should I create the indexes first or should I create the partitions first?

Set up that partitioning objects first. Then, create or rebuild the clustered index on the new partitioning scheme. If possible drop other indexes first and recreate them afterwards (might not work due to availability restrictions).

what should I do to maintain these with new data coming in weekly.

What concerns do you have? New data will be stored in the appropriate partitions automatically. Make sure to create new partitions before loading the data. Keep partitions ready for 2 weeks in advance. The latest partitions must always be empty to avoid costly splits.

There is no primary key in the table.

Most often this is a not a good design. Most tables should have a primary key and a clustered index. If there is no natural key use an artifical one such as a bigint identity.


You definitely can apply partitioning but my feeling is that it will not gain you what you maybe expect. But it will force you to take on additional maintenance burdens, possibly reduce performance and there is risk of making mistakes that threaten availability. Simplicity is important.

like image 198
usr Avatar answered Oct 26 '22 02:10

usr