I've got a really large table (10+ million rows) that is starting to show signs of performance degradation for queries. Since this table will probably double or triple in size relatively soon I'm looking into partitioning the table to squeeze out some query performance.
The table looks something like this:
CREATE TABLE [my_data] (
[id] [int] IDENTITY(1,1) NOT NULL,
[topic_id] [int] NULL,
[data_value] [decimal](19, 5) NULL
)
So, a bunch of values for any given topic. Queries on this table will always be by topic ID, so there's a clustered index on (id, topic_id).
Anyway, since topic IDs aren't bounded (any number of topics could be added) I'd like to try partitioning this table on a modulus function of the topic IDs. So something like:
topic_id % 4 == 0 => partition 0
topic_id % 4 == 1 => partition 1
topic_id % 4 == 2 => partition 2
topic_id % 4 == 3 => partition 3
However, I haven't seen any way to tell "create partition function" or "create partition scheme" to perform this operation when deciding on a partition.
Is this even possible? How can we make a partition function based on an operation performed on the input value?
Partitioning is based on a key column modulo the number of partitions. This method is similar to hash by field, but involves simpler computation. In data mining, data is often arranged in buckets, that is, each record has a tag containing its bucket number.
When we want to do an aggregation on a specific column, we can apply PARTITION BY clause with the OVER clause. Let's look at the example below to see how the dataset has been transformed. In the example, I want to calculate the total and average amount of money that each function brings for the trip.
You can use a single partition function to partition multiple objects. Create a partition scheme that maps the partitions of a partitioned table or index to one filegroup or to multiple filegroups. You can use a single partition scheme to partition multiple objects.
You just need to create your modulus column as a PERSISTED computed column.
Blue Peter style, here's one I made earlier (although I'm not 100% sure I have the partition values clause right):
CREATE PARTITION FUNCTION [PF_PartitonFour] (int)
AS RANGE RIGHT
FOR VALUES (
0,
1,
2)
GO
CREATE PARTITION SCHEME [PS_PartitionFourScheme]
AS PARTITION [PF_PartitonFour]
TO ([TestPartitionGroup1],
[TestPartitionGroup2],
[TestPartitionGroup3],
[TestPartitionGroup4])
GO
CREATE TABLE [my_data] (
[id] [int] IDENTITY(1,1) NOT NULL,
[topic_id] [int] NULL,
[data_value] [decimal](19, 5) NULL
[PartitionElement] AS [topic_id] % 4 PERSISTED,
) ON [PS_PartitionFourScheme] (PartitionElement);
GO
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With