Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Azure table storage performance - REST vs. StorageClient

I am working with Azure Table Storage, and trying to figure out the best way to increase performance. The queries that I perform are very simple - either an exact select using partition key and row key, or a where clause with a list (e.g., WHERE x==1 or x==2 or x==3, etc). Once I get the data back, I don't track it in a data context (no need for change tracking, etc). Saving data is likewise, so I only add it to the context to enable the save.

At the moment, I am using the .NET library (storage client). As I don't use the change tracking and other features of the TableServiceContext, I am thinking about using the HTTP API directly. Has anyone tried both options? If so, what kind of performance difference did you see?

Thanks, Erick

like image 645
Erick T Avatar asked Aug 19 '11 07:08

Erick T


People also ask

Is Azure table storage fast?

The arrangement of data across partitions affects query performance. Retrieving a records by their primary key is always very fast but Azure Tables resorts to table scans to find any data that is not in the same partition. Each scanned row counts towards that 20,000 operations per second limit.

Which Azure storage option is better for storing data for backup and restore disaster recovery and archiving?

Blob Storage provides backup and disaster recovery capabilities. For more information, see Backup and disaster recovery for Azure IaaS disks. You can also use Blob Storage to back up other resources, like on-premises or IaaS virtual machine-hosted SQL Server data.

Is Azure table storage cheaper than Azure SQL Database?

Azure tables are only cheaper than SQL Azure if the data access pattern is relatively light, since tables have a per-transaction fee and SQL Azure doesn't.


2 Answers

Table storage can be a bit of a fickle beast to optimize performance. There are a variety of factors that will impact it. Here are just a few off the top of my head:

  1. Using a Partition Key in every query is a must. If you are not doing this, you are doing it wrong. If you use single PK and single RK (and only those two), it is no longer a query, but a resource GET and should be relatively instantaneous.
  2. Do not use OR-based queries. This will cause a full table scan and your performance will be horrible. Instead, parallelize those queries within the OR statement.
  3. Partitioning strategy will have a major impact. How many partitions you have and how often you hit them (to warm them up and cause the underlying partition servers to load balance) will cause dramatic differences. The size of the partition makes a big impact here too. Sequential partition keys is often a bad idea.
  4. Small requests can benefit from turning off nagling (as previously mentioned).
  5. Turn off context tracking and 100 continue (see here) can help as well.

There are many more I suppose that depend on your application. However, the ones I mention are generally the ones I start with.

like image 56
dunnry Avatar answered Oct 24 '22 03:10

dunnry


Have you turned Nagle off?

  • Nagle’s Algorithm is Not Friendly towards Small Requests

Of Interest:

  • Maximizing Throughput in Windows Azure – Part 1
like image 22
Mitch Wheat Avatar answered Oct 24 '22 03:10

Mitch Wheat