Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Azure Table Storage - PartitionKey and RowKey selection to use between query

I am a total newbie with Azure! The purpose is to return the rows based on the timestamp stored in the RowKey. As there is a transaction cost with each query, I want to minimize the number of transactions/queries whilst maintain performance

These are the proposed Partition and Row Keys:

  • Partition Key: TextCache_(AccountID)_(ParentMessageId)
  • Row Key: (DateOfMessage)_(MessageId)

Legend:

  • AccountId - is an integer
  • ParentMessageId - The parent messageId if there is one, blank if it is the parent
  • DateOfMessage - Date the message was created - format will be DateTime.Ticks.ToString("d19")
  • MessageId - the unique Id of the message

I would like to get back from a single query the rows and any childrows that is > or < DateOfMessage_MessageId

Can this be done via my proposed PartitionKeys and RowKeys?

ie.. (in psuedo code)

var results = ctx.PartitionKey.StartsWith(TextCache_AccountId) 
   && ctx.RowKey > (TimeStamp)_MessageId

Secondly, if there I have a number of accounts, and only want to return back the first 10, could it be done via a single query

ie.. (in psuedo code)

var results = ( 
      ( 
        ctx.PartitionKey.StartsWith(TextCache_(AccountId1)) && 
            && ctx.RowKey > (TimeStamp1)_MessageId1 )
      )
      ||
      ( 
        ctx.PartitionKey.StartsWith(TextCache_(AccountId2)) && 
            && ctx.RowKey > (TimeStamp2)_MessageId2 )
      ) ... 
          )
         .Take(10)
like image 817
Jason Jong Avatar asked May 03 '11 02:05

Jason Jong


People also ask

What is PartitionKey and RowKey in Azure Table?

The row key is a unique identifier for an entity within a given partition. Together the PartitionKey and RowKey uniquely identify every entity within a table. The row key is a string value that may be up to 1 KiB in size. You must include the RowKey property in every insert, update, and delete operation.

How should you choose a good partition key for a Table storage implementation?

How should you choose a good partition key for a Table storage implementation? (Choose all that apply.) They should always be unique, like a primary key in a SQL table. You should always use the same partition key for all records. Think about how you're likely to update the data using batch transactions.

Can you query Azure Table storage?

You can write queries against the Table service using LINQ syntax.

How do I retrieve data from Azure Table storage?

Enter an Account Name, Account Key, and Table Name on the Azure Table tab of the New Session dialog. Select either HTTP or HTTPS as the connection Protocol. Ensure that the Analysis Grid viewer is selected in the Start With drop-down list. Start retrieving data by clicking the Start button in the New Session dialog.


1 Answers

The short answer to your questions is yes, but there are some things you need to watch for.

Azure table storage doesn't have a direct equivalent of .StartsWith(). If you're using the storage library in combination with LINQ you can use .CompareTo() (> and < don't translate properly) which will mean that if you run a search for account 1 and you ask the query to return 1000 results, but there are only 600 results for account 1, the last 400 results will be for account 10 (the next account number lexically). So you'll need to be a bit smart about how you deal with your results.

If you padded out the account id with leading 0s you could do something like this (pseudo code here as well)

ctx.PartionKey > "TextCache_0000000001"
&& ctx.PartitionKey < "TextCache_0000000002"
&& ctx.RowKey > "123465798"

Something else to bear in mind is that queries to Azure Tables return their results in PartitionKey then RowKey order. So in your case messages without a ParentMessageId will be returned before messages with a ParentMessageId. If you're never going to query this table by ParentMessageId I'd move this to a property.

If TextCache_ is just a string constant, it's not adding anything by being included in the PartitionKey unless this will actually mean something to your code when it's returned.

While you're second query will run, I don't think it will produce what you're after. If you want the first ten rows in DateOfMessage order, then it won't work (see my point above about sort orders). If you ran this query as it is and account 1 had 11 messages it will return only the first 10 messages related to account 1 regardless if whether account 2 had an earlier message.

While trying to minimise the number of transactions you use is good practice, don't be too concerned about it. The cost of running your worker/web roles will dwarf your transaction costs. 1,000,000 transactions will cost you $1 which is less than the cost of running one small instance for 9 hours.

like image 66
knightpfhor Avatar answered Sep 28 '22 13:09

knightpfhor