We are about to implement the Read portion of our CQRS system in-house with the goal being to vastly improve our read performance. Currently our reads are conducted through a web service which runs a Linq-to-SQL query against normalised data, involving some degree of deserialization from an SQL Azure database. The simplified structure of our data is: <ul> <li>User</li> <li>Conversation (Grouping of Messages to the same recipients)</li> <li>Message</li> <li>Recipients (Set of Users)</li> </ul> I want to move this into a denormalized state, so that when a user requests to see a feed of messages it reads from EITHER: A denormalized representation held in Azure Table Storage <ul> <li>UserID as the PartitionKey</li> <li>ConversationID as the RowKey</li> <li>Any volatile data prone to change stored as entities</li> <li>The messages serialized as JSON in an entity</li> <li>The recipients of said messages serialized as JSON in an entity</li> <li>The main problem with this the limited size of a row in Table Storage (960KB)</li> <li>Also any queries on the "volatile data" columns will be slow as they aren't part of the key</li> </ul> A normalized representation held in Azure Table Storage <ul> <li>Different table for Conversation details, Messages and Recipients</li> <li>Partition keys for message and recipients stored on the Conversation table.</li> <li>Bar that; this follows the same structure as above</li> <li>Gets around the maximum row size issue</li> <li>But will the normalized state reduce the performance gains of a denormalized table?</li> </ul> OR A denormalized representation held in SQL Azure <ul> <li>UserID & ConversationID held as a composite primary key</li> <li>Any volatile data prone to change stored in separate columns</li> <li>The messages serialized as JSON in a column</li> <li>The recipients of said messages serialized as JSON in an column</li> <li>Greatest flexibility for indexing and the structure of the denormalized data</li> <li>Much slower performance than Table Storage queries</li> </ul> What I'm asking is whether anyone has any experience implementing a denormalized structure in Table Storage or SQL Azure, which would you choose? Or is there a better approach I've missed? My gut says the normalized (At least to some extent) data in Table Storage would be the way to go; however I am worried it will reduce the performance gains to conduct 3 queries in order to grab all the data for a user.

I won't try to argue on the exact definition of CQRS. As we are talking about Azure, I'll use it's docs as a reference. From there we can find that: <ol> <li> CQRS doesn't necessary requires that you use a separate read storage. <blockquote> For greater isolation, you can physically separate the read data from the write data. </blockquote> "you can" doesn't mean "you must". </li> <li> About denormalization and read optimization: Although <blockquote> The read model of a CQRS-based system provides materialized views of the data, typically as highly denormalized views </blockquote> the key point is <blockquote> the read database can use its own data schema that is optimized for queries </blockquote> It can be a different schema, but it can still be normalized or at least not "highly denormalized". Again - you can, but that doesn't mean you must. More than that, if you performance is poor due to write locks and not because of heavy SQL requests: <blockquote> The read store can be a read-only replica of the write store </blockquote> And when we talk about request's optimization, it's better to talk more about requests themselves, and less about storage types. </li> <li> About "it reads from either" [...] <blockquote> The Materialized View pattern describes generating prepopulated views of data in environments where the source data isn't in a suitable format for querying, where generating a suitable query is difficult, or where query performance is poor due to the nature of the data or the data store. </blockquote> Here the key point is that views are plural. <blockquote> A materialized view can even be optimized for just a single query. ... Materialized views tend to be specifically tailored to one, or a small number of queries </blockquote> So you choice is not between those 3 options. It's much wider actually. And again, you don't need another storage to create views. All can be done inside a single DB. </li> <li> About <blockquote> My gut says the normalized (At least to some extent) data in Table Storage would be the way to go; however I am worried it will reduce the performance gains to conduct 3 queries in order to grab all the data for a user. </blockquote> Yes, of course, performance will suffer! (Also consider the matter of consistency). But will it be OK or not you can never be sure until you test it. With your data and your requests. Because delays in data transfers can actually be less than time required for some elaborate SQL-request. </li> </ol> So all boils down to: <ol> <li>What features do you need and which of them Table Storage and/or SQL Azure have?</li> <li>And then, how much will it cost?</li> </ol> These you can only answer yourself. And these choices have little to do with performance. Because if there is a suitable index in either of those, I believe the performance will be virtually indistinguishable. To sum up: <blockquote> SQL Azure or Azure Table Storage? </blockquote> For different requests and data you can and you probably should use both. But there is too little information in the question to give you the exact answer (we need an exact request for that). But I agree with @HerveRoggero - most probably you should stick with SQL Azure.

Do I use Azure Table Storage or SQL Azure for our CQRS Read System?

Tags:

azure-sql-database

cqrs

azure

azure-storage

azure-table-storage

We are about to implement the Read portion of our CQRS system in-house with the goal being to vastly improve our read performance. Currently our reads are conducted through a web service which runs a Linq-to-SQL query against normalised data, involving some degree of deserialization from an SQL Azure database.

The simplified structure of our data is:

User
Conversation (Grouping of Messages to the same recipients)
Message
Recipients (Set of Users)

I want to move this into a denormalized state, so that when a user requests to see a feed of messages it reads from EITHER:

A denormalized representation held in Azure Table Storage

UserID as the PartitionKey
ConversationID as the RowKey
Any volatile data prone to change stored as entities
The messages serialized as JSON in an entity
The recipients of said messages serialized as JSON in an entity
The main problem with this the limited size of a row in Table Storage (960KB)
Also any queries on the "volatile data" columns will be slow as they aren't part of the key

A normalized representation held in Azure Table Storage

Different table for Conversation details, Messages and Recipients
Partition keys for message and recipients stored on the Conversation table.
Bar that; this follows the same structure as above
Gets around the maximum row size issue
But will the normalized state reduce the performance gains of a denormalized table?

A denormalized representation held in SQL Azure

UserID & ConversationID held as a composite primary key
Any volatile data prone to change stored in separate columns
The messages serialized as JSON in a column
The recipients of said messages serialized as JSON in an column
Greatest flexibility for indexing and the structure of the denormalized data
Much slower performance than Table Storage queries

What I'm asking is whether anyone has any experience implementing a denormalized structure in Table Storage or SQL Azure, which would you choose? Or is there a better approach I've missed?

My gut says the normalized (At least to some extent) data in Table Storage would be the way to go; however I am worried it will reduce the performance gains to conduct 3 queries in order to grab all the data for a user.

405

asked Jul 09 '12 08:07

Luke Merrett

2 Answers

Your primary driver for considering Azure Tables is to vastly improve read performance, and in your scenario using SQL Azure is "much slower" according to your last point under "A denormalized representation held in SQL Azure". I personally find this very surprising for a few reasons and would ask for detailed analysis on how this claim was made. My default position would be that under most instances, SQL Azure would be much faster.

Here are some reasons for my skepticism of the claim:

SQL Azure uses the native/efficient TDS protocol to return data; Azure Tables use JSON format, which is more verbose
Joins / Filters in SQL Azure will be very fast as long as you are using primary keys or have indexes in SQL Azure; Azure Tables do not have indexes and joins must be performed client side
Limitations in the number of records returned by Azure Tables (1,000 records at a time) means you need to implement multiple roundtrips to fetch many records

Although you can fake indexes in Azure Tables by creating additional tables that hold a custom-built index, you own the responsibility of maintaining that index, which will slow your operations and possibly create orphan scenarios if you are not careful.

Last but not least, using Azure Tables usually makes sense when you are trying to reduce your storage costs (it is cheaper than SQL Azure) and when you need more storage than what SQL Azure can offer (although you can now use Federations to break the single database maximum storage limitation). For example, if you need to store 1 billion customer records, using Azure Table may make sense. But using Azure Tables for increase speed alone is rather suspicious in my mind.

If I were in your shoes I would question that claim very hard and make sure you have expert SQL development skills on staff that can demonstrate you are reaching performance bottlenecks inherent of SQL Server/SQL Azure before changing your architecture entirely.

In addition, I would define what your performance objectives are. Are you looking at 100x faster access times? Did you consider caching instead? Are you using indexing properly in your database?

My 2 cents... :)

answered Oct 19 '22 10:10

Herve Roggero

I won't try to argue on the exact definition of CQRS. As we are talking about Azure, I'll use it's docs as a reference. From there we can find that:

CQRS doesn't necessary requires that you use a separate read storage.

For greater isolation, you can physically separate the read data from the write data.

"you can" doesn't mean "you must".
About denormalization and read optimization:

Although

The read model of a CQRS-based system provides materialized views of the data, typically as highly denormalized views

the key point is

the read database can use its own data schema that is optimized for queries

It can be a different schema, but it can still be normalized or at least not "highly denormalized". Again - you can, but that doesn't mean you must.

More than that, if you performance is poor due to write locks and not because of heavy SQL requests:

The read store can be a read-only replica of the write store

And when we talk about request's optimization, it's better to talk more about requests themselves, and less about storage types.
About "it reads from either" [...]

The Materialized View pattern describes generating prepopulated views of data in environments where the source data isn't in a suitable format for querying, where generating a suitable query is difficult, or where query performance is poor due to the nature of the data or the data store.

Here the key point is that views are plural.

A materialized view can even be optimized for just a single query.

...

Materialized views tend to be specifically tailored to one, or a small number of queries

So you choice is not between those 3 options. It's much wider actually. And again, you don't need another storage to create views. All can be done inside a single DB.
About

My gut says the normalized (At least to some extent) data in Table Storage would be the way to go; however I am worried it will reduce the performance gains to conduct 3 queries in order to grab all the data for a user.

Yes, of course, performance will suffer! (Also consider the matter of consistency). But will it be OK or not you can never be sure until you test it. With your data and your requests. Because delays in data transfers can actually be less than time required for some elaborate SQL-request.

So all boils down to:

What features do you need and which of them Table Storage and/or SQL Azure have?
And then, how much will it cost?

These you can only answer yourself. And these choices have little to do with performance. Because if there is a suitable index in either of those, I believe the performance will be virtually indistinguishable.

To sum up:

SQL Azure or Azure Table Storage?

For different requests and data you can and you probably should use both. But there is too little information in the question to give you the exact answer (we need an exact request for that). But I agree with @HerveRoggero - most probably you should stick with SQL Azure.

answered Oct 19 '22 08:10

x00

Related questions
                            
                                Is it possible to edit web.config of cloud app deployed on windows Azure without redeploying my app?
                            
                                What is the difference between Azure Container Instances and Web App for Containers?
                            
                                How to create a csr file for azure web app
                            
                                How can a script access Service Connections? (Azure Devops Pipelines)
                            
                                StackExchange.Redis with Azure Redis is unusably slow or throws timeout errors
                            
                                Choosing between .NET Service Bus Queues vs Azure Queue Service [closed]
                            
                                What is the difference between Azure IoT Hub and Azure IoT Central?
                            
                                The data protection operation was unsuccessful on Azure using OWIN / Katana
                            
                                Server farm (service plan) SKUs
                            
                                How to import Azure SQL backup (.bacpac) to LocalDB using Visual Studio?
                            
                                Azure key vault: access denied
                            
                                Getting site under construction message after azure webapp deployment
                            
                                Windows Azure Virtual Machine - Opening a port
                            
                                How to create a kubectl config file for serviceaccount
                            
                                Error : The resource you are looking for has been removed, had its name changed, or is temporarily unavailable
                            
                                Client with IP address is not allowed to access the server Azure SQL database
                            
                                Azure Blob always downloads when navigating to url
                            
                                Azure Storage Emulator fails after adding VS 2015 SDK
                            
                                Azure storage: Uploaded files with size zero bytes
                            
                                The resource is not defined in the template

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With