Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using Azure Storage Tables as Queues with multiple Worker Roles processing it?

My application will be receiving 1000+ requests/transactions every second, via multiple instances of the Web Role. These roles will write a record for every transaction across multiple Storage Tables (randomly, to spread Azure's 500 transactions/sec limit). Now, I need a reliable way to process/aggregate this data using multiple Worker Roles and write the results to my SQL database. AKA, this needs to scale horizontally.

I need to retain/archive all of the transactions in the Storage tables post-processing, so I could go with having one set of tables for queues, and when they are processed move them onto the archive tables, or perhaps there is a way to do this on a single table, not sure.

What would you recommend as far as a mechanism to distribute the current workload in these queues across my Work Roles? Obviously, each role has to be aware of what every other role is working on, so they only work on unclaimed transactions. I think each role will be retrieving 1000 records from the queue as a single work load and multiple worker roles could be working on the same queue.

Should I keep the Worker Roles "state" in a Cache, perhaps in SQL server.

Your suggestions are much appreciated.

like image 906
enlightenedOne Avatar asked Dec 15 '22 14:12

enlightenedOne


2 Answers

I recommend you use a proper queue service to implement this feature instead of trying to implement queueing over the table service. This way you won't have to implement complex logic to know which records have been processed (logic that becomes difficult when you consider fault tolerance and possible errors, especially in a service such as Table Storage that has a very limited transaction capability). Trying to coordinate multiple workers reliably, accounting for all possible failure scenarios, and being scalable at the same time is something I wouldn't attempt at application level.

For instance:

  1. The web role receives a request that represents a transaction;
  2. The web role writes data to several tables;
  3. The web role sends a message to the queue service representing the transaction with some unique ID (for instance the request ID, if there isn't another suitable primary key).
  4. The worker role pulls messages from the queue.
  5. For each message the worker role retrieves the set of objects from the table storage corresponding to the unique identifier of the message.
  6. The worker role aggregates data as needed and writes it to SQL Database.

Notes:

  1. Use either the Queue Service (from Storage) or Service Bus queues.
  2. Spread the load among many queues for scalability.
  3. Be sure to apply proper handling at all levels to account for transient failures.
  4. Deal with the possibility of processing the same message more than once (the processing should be idempotent).
like image 167
Fernando Correia Avatar answered Apr 23 '23 22:04

Fernando Correia


I agree with Fernando. Please take a look at my blog post on this very topic; it has to do with large scale processing of Azure Queues. This is based on a project I did with higher throughput requirements than the ones you posted.

like image 31
Herve Roggero Avatar answered Apr 23 '23 23:04

Herve Roggero