Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Performance - Table Service, SQL Azure - insert. Query speed on large amount of data

I'd read many posts and articles about comparing SQL Azure and Table Service and most of them told that Table Service is more scalable than SQL Azure.

  • http://www.silverlight-travel.com/blog/2010/03/31/azure-table-storage-sql-azure/
  • http://www.intertech.com/Blog/post/Windows-Azure-Table-Storage-vs-Windows-SQL-Azure.aspx
  • Microsoft Azure Storage vs. Azure SQL Database
  • https://social.msdn.microsoft.com/Forums/en-US/windowsazure/thread/2fd79cf3-ebbb-48a2-be66-542e21c2bb4d
  • https://blogs.msdn.com/b/windowsazurestorage/archive/2010/05/10/windows-azure-storage-abstractions-and-their-scalability-targets.aspx
  • https://stackoverflow.com/questions/2711868/azure-performance
  • http://vermorel.com/journal/2009/9/17/table-storage-or-the-100x-cost-factor.html
  • Azure Tables or SQL Azure?
  • http://www.brentozar.com/archive/2010/01/sql-azure-frequently-asked-questions/
  • https://code.google.com/p/lokad-cloud/wiki/FatEntities

Sorry for http, I'm new user >_< But http://azurescope.cloudapp.net/BenchmarkTestCases/ benchmark shows different picture.

My case. Using SQL Azure: one table with many inserts, about 172,000,000 per day(2000 per second). Can I expect good perfomance for inserts and selects when I have 2 million records or 9999....9 billion records in one table?

Using Table Service: one table with some number of partitions. Number of partitions can be large, very large.

Question #1: is Table service has some limitations or best practice for creating many, many, many partitions in one table?

Question #2: in a single partition I have a large amount of small entities, like in SQL Azure example above. Can I expect good perfomance for inserts and selects when I have 2 million records or 9999 billion entities in one partition?

I know about sharding or partition solutions, but it is a cloud service, is cloud not powerfull and do all work without my code skills?

Question #3: Can anybody show me benchmarks for quering on large amount of datas for SQL Azure and Table Service?

Question #4: May be you could suggest a better solution for my case.

like image 316
tartrius Avatar asked Oct 06 '10 12:10

tartrius


People also ask

How can you improve the performance of an insert query?

To optimize insert speed, combine many small operations into a single large operation. Ideally, you make a single connection, send the data for many new rows at once, and delay all index updates and consistency checking until the very end.

How can you improve the performance of SQL Azure databases?

There are three primary options for Automatic Tuning with Azure SQL Database: CREATE INDEX: Creates new indices that can improve the performance. DROP INDEX: Drops redundant and unused indices (>90 days) FORCE LAST GOOD PLAN: Identifies queries using the last known good execution plan.

What will happen when SQL Azure database will reach the max size?

When data space used reaches the maximum data size limit, either at the database level or at the elastic pool level, inserts and updates that increase data size fail and clients receive an error message. SELECT and DELETE statements remain unaffected.

How fast is Azure table storage?

With Azure Table, your throughput is limited to 20k operations per second while with Cosmos DB throughput is supported for up to 10 million operations per second. Additionally, Cosmos DB provides automatic indexing of properties. This can be used during querying to increase performance.


1 Answers

Short Answer

  1. I haven't seen lots of partitions cause Azure Tables (AZT) problems, but I don't have this volume of data.
  2. The more items in a partition, the slower queries in that partition
  3. Sorry no, I don't have the benchmarks
  4. See below

Long Answer

In your case I suspect that SQL Azure is not going work for you, simply because of the limits on the size of a SQL Azure database. If each of those rows you're inserting are 1K with indexes you will hit the 50GB limit in about 300 days. It's true that Microsoft are talking about databases bigger than 50GB, but they've given no time frames on that. SQL Azure also has a throughput limit that I'm unable to find at this point (I pretty sure it's less than what you need though). You might be able to get around this by partitioning your data across more than one SQL Azure database.

The advantage SQL Azure does have though is the ability to run aggregate queries. In AZT you can't even write a select count(*) from customer without loading each customer.

AZT also has a limit of 500 transactions per second per partition, and a limit of "several thousand" per second per account.

I've found that choosing what to use for your partition key (PK) and row key depends (RK) on how you're going to query the data. If you want to access each of these items individually, simply give each row it's own partition key and a constant row key. This will mean that you have lots of partition.

For the sake of example, if these rows you were inserting were orders and the orders belong to a customer. If it was more common for you to list orders by customer you would have PK = CustomerId, RK = OrderId. This would mean to find orders for a customer you simply have to query on the partition key. To get a specific order you'd need to know the CustomerId and the OrderId. The more orders a customer had, the slower finding any particular order would be.

If you just needed to access orders just by OrderId, then you would use PK = OrderId, RK = string.Empty and put the CustomerId in another property. While you can still write a query that brings back all orders for a customer, because AZT doesn't support indexes other than on PartitionKey and RowKey if your query doesn't use a PartitionKey (and sometimes even if it does depending on how you write them) will cause a table scan. With the number of records you're talking about that would be very bad.

In all of the scenarios I've encountered, having lots of partitions doesn't seem to worry AZT too much.

Another way you can partition your data in AZT that is not often mentioned is to put the data in different tables. For example, you might want to create one table for each day. If you want to run a query for last week, run the same query against the 7 different tables. If you're prepared to do a bit of work on the client end you can even run them in parallel.

like image 86
knightpfhor Avatar answered Oct 16 '22 03:10

knightpfhor