Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Retrieving many rows using a TableBatchOperation is not supported?

Here is a piece of code that initialize a TableBatchOperation designed to retrieve two rows in a single batch:

 TableBatchOperation batch = new TableBatchOperation();
 batch.Add(TableOperation.Retrieve("somePartition", "rowKey1"));
 batch.Add(TableOperation.Retrieve("somePartition", "rowKey2")); 
 //second call throws an ArgumentException:
 //"A batch transaction with a retrieve operation cannot contain 
 //any other operation"

As mentionned, an exception is thrown, and it seems not supported to retrieve N rows in a single batch. This is a big deal to me, as I need to retrieve about 50 rows per request. This issue is as much performance wise as cost wise. As you may know, Azure Table Storage pricing is based on the amount of transactions, which means that 50 retrieve operations is 50 times more expensive than a single batch operation.

Have I missed something?

Side note I'm using the new Azure Storage api 2.0. I've noticed this question has never been raised on the web. This constraint might have been added recently?

edit

I found a related question here: Very Slow on Azure Table Storage Query on PartitionKey/RowKey List. It seems using TableQuery with "or" on rowkeys will results with a full table scan. There's really a serious issue here...

like image 259
uzul Avatar asked Jan 06 '13 04:01

uzul


1 Answers

When designing your Partition Key (PK) and Row Key (RK) scheme in Azure Table Storage (ATS) your primary consideration should be how you're going to retrieve the data. As you've said each query you run costs both money, but more importantly time so you need to get all of the data back in one efficient query. The efficient queries that you can run on ATS are of these types:

  • Exact PK and RK
  • Exact PK, RK range
  • PK Range
  • PK Range, RK range

Based on your comments I'm guessing you've got some data that is similar to this:

PK    RK     Data
Guid1 A      {Data:{...}, RelatedRows: [{PK:"Guid2", RK:"B"}, {PK:"Guid3", RK:"C"}]}
Guid2 B      {Data:{...}, RelatedRows: [{PK:"Guid1", RK:"A"}]
Guid3 C      {Data:{...}, RelatedRows: [{PK:"Guid1", RK:"A"}];}

and you've retrieved the data at Guid1, and now you need to load Guid2 and Guid3. I'm also presuming that these rows have no common denominator like they're all for the same user. With this in mind I'd create an extra "index table" which could look like this:

PK      RK      Data
Guid1-A Guid2-B {Data:{....}}
Guid1-A Guid3-C {Data:{....}}
Guid2-B Guid1-A {Data:{....}}
Guid2-B Guid1-A {Data:{....}}

Where the PK is the combined PK and RK of the parent and the RK is the combined PK and RK of the child row. You can then run a query which says return all rows with PK="Guid1-A" and you will get all related data with just one call (or two calls overall). The biggest overhead this creates is in your writes, so now when you right a row you also have to write rows for each of the related rows as well and also make sure that the data is kept up to date (this may not be an issue for you if this is a write once kind of scenario).

If any of my assumptions are wrong or if you have some example data I can update this answer with more relevant examples.

like image 93
knightpfhor Avatar answered Oct 13 '22 15:10

knightpfhor