We've been using a hybrid architecture on Windows Azure, storing most entities in a SQL Azure database, but throwing anything that's likely to require significant amounts of storage space into Azure Table Storage.
With this architecture, though, we're running into all sorts of problems with Azure Table Storage, which strikes me as an immature and incomplete product at best. The biggest limitation is that, for all practical purposes, it's a write-only data store. The consensus is that its write capabilities scale very, very well, but its querying and indexing capabilities are so astonishingly limited (despite years of users complaining and Microsoft promising) that I've come to the conclusion you should basically only ever try to retrieve data out of ATS in an emergency. Getting data out of it for a complex, realtime, transactional production app is way more difficult than it should be. There are workarounds, of course, like maintaining multiple copies of data, with different indexing strategies for each copy, or splitting up your queries and running them in parallel, but that's adding complexity when the whole point of a cloud service is to minimize it.
That said, we're committed to Azure for now, and I would like to have a good sense for what the alternatives and pitfalls are, preferably from folks that have actually been down this road in production.
I'm quite well aware that there are lots of NoSQL options out there (e.g., all the ones listed in this question: What NoSQL solutions are out there for .NET?) that I can run either on a VM or in some other cloud. But I'm specifically interested in knowing whether there are any that fit well into Azure's PAAS model. In other words, if I'm on Azure, and don't want to manage my own VM's, and want something as close as possible to the almost automatic and nearly infinite scalability promised (though never quite delivered) by ATS, what options have people found valuable? Is the MongoDB/Azure wrapper a simple and viable alternative? Or should I just bite the bullet and spin up my own VM's? Or switch over to AWS? Or stick with Azure SQL?
(To give you a sense of our size requirements: we're thinking we'll be needing to store upwards of a billion rows. Not huge, but not negligible either.)
Although Azure table storage does not support secondary indexes, and does not have the feature set of SQL, it is not trying to solve the same problem.
I would avoid SQL Azure (or whatever it's called now) and focus on building a data layer that uses what Azure is good at (blobs, tables, and queues).
We have found table storage to be more than adequate for a large production solution. It has gotten a lot better over the last 18 months or so. The v2 of the .NET client library is much better than v1.
As with most applications, a direct port of the architecture onto a cloud platform is rarely a good idea. Rethinking the way that you have solved previous business problems with a solid understanding of what's available in the cloud is the only path to success.
I agree with a previous post that something like Lucene could be good if you need to index a lot of data. We find that using tables and blobs well we are able to make do without, but it's definitely an option in your toolbox.
We have gone through a similar situation and have researched several options, which offers Azure and nosql options.
The measure we have taken has been to use Azure Blob Storage and Lucene.Net. We serialize our objects in Json and then save them in AzureBlobs.
We use Lucene.Net to create indexes, Lucene.Net returns the data we need to get the blobs that contain the data we want to search. We do not have a development in production yet with this combination but in the tests we have done it is working very well.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With