Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Azure Cosmos DB partition key - is primary key acceptable?

Our Azure Cosmos DB collection has gotten large enough to require a partition key. In doing some reading about this, I get the impression that the best partition key is one that provides for even distribution and higher cardinality. This article from Microsoft discusses it.

Using a primary key as a partition key provides for even distribution, but a cardinality of only 1. If this is my only option, is this a bad thing? The aforementioned article gives a few examples and seems to indicate that the primary key should be used as a partition key in those instances. In the case of Azure Cosmos DB, the partitions are logical, not physical. So it wouldn't lead to having each document on its own disk, but it seems like it could lead to a bloated index.

Is using a primary key as a partition key a common practice? Are there any downsides to it?

like image 216
Scotty H Avatar asked Jun 27 '18 21:06

Scotty H


People also ask

Is partition key same as primary key in Cosmos DB?

In terms of cost, cosmos db is charged primarily by storage space and RUs consumption.As you said, choosing primary key as partition key will lead more indexes storage. If mostly queries are cross-partition, it also leads more RUs consumption.

What should be the partition key in Cosmos DB?

Selecting your partition key is a simple but important design choice in Azure Cosmos DB. Once you select your partition key, it is not possible to change it in-place. If you need to change your partition key, you should move your data to a new container with your new desired partition key.

Is partition key same as primary key?

Partition key: A simple primary key, composed of one attribute known as the partition key. Attributes in DynamoDB are similar in many ways to fields or columns in other database systems.

Does Cosmos DB has primary key?

The mid-tier service possesses the primary key of the Cosmos DB account. The photo app is installed on end-user mobile devices.


2 Answers

Actually , the choice of partition key is a question that deserves to be weighed repeatedly. Since choosing primary key to be the partition key is your only option, I just discuss some of the possible negative things as your references.

In terms of performance, if your query's field is not partition key, your query will definitely reduce query performance by crossing partitions. Arguably, if the amount of data is small, it won't have much effect.

In terms of cost, cosmos db is charged primarily by storage space and RUs consumption.As you said, choosing primary key as partition key will lead more indexes storage. If mostly queries are cross-partition, it also leads more RUs consumption.

In terms of using of stored procedure, triggers or UDF, you can't use cross-partition transactions via stored procedures and triggers. Because then are partitioned so that you need to specify the partition key(cardinality is only 1) when you use them.

Just note that if partition key is created, it cannot be deleted or modified later. So consider it before you choose and make sure you do the data backup.

More details, still refer to the official doc.

like image 156
Jay Gong Avatar answered Sep 18 '22 08:09

Jay Gong


No, there is no downside to it. Strive to have partition key with high cardinality. Don't worry about indexes or physical partitions etc.

You can have million of partition keys and 10 physical partitions. Physical partitions are created behind the scene by CosmosDB. You should never worry about physical partitions.

like image 24
Rafat Sarosh Avatar answered Sep 21 '22 08:09

Rafat Sarosh