Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to avoid a 404 when deleting batches from Azure Table Storage

Problem

I am trying to delete a lot of rows from table storage that may or may not exist. The deal is I need to minimise I/O and maximise bandwidth so 1 hit to rule them all would be awesome. Problem is that if any of the batch entities doesn't exist the whole batch fails.

Why

This also brings me to a design question - why doesn't the request simply return a deletion result with indication of which objects were not deleted due to 404. Why does it throw an exception? What is the reason for it.

More info

The batch size is within Table Storage constraint of 100 and they are all within the same partition.

like image 200
Wojtek Turowicz Avatar asked Jun 12 '15 09:06

Wojtek Turowicz


People also ask

How do I clear my Azure Storage table data?

Since Azure Table Storage has no native way to purge old data, your best best is to structure your data so that you can simply delete old tables when you no longer need them and purge data that way. However, if you can't do that, feel free to make use of this AzureTablePurger tool.

Is Azure table storage being deprecated?

Azure Table Storage is being (or has already been) phased out completely in favor of Cosmos DB.

Is Azure table storage persistent?

Azure DisksAllows data to be persistently stored and accessed from an attached virtual hard disk.


2 Answers

To answer your question, there's no way to avoid this kind of situation. If an entity in the batch fails, the whole batch will fail.

However there's one thing you could do:

When the batch fails, it returns the index of the failed entity. What you could do is take that batch and create 3 separate batches out of that. 1st batch will be from first entity (0th index) to the index of the failed entity (minus one), 2nd will be the failed entity (so just one entity) and the last one would be from the failed entity index to the last entity. For the failed entity, you could simply try DeleteIfExists. So assuming you have 100 entities in a batch and let's say 30th entity fails, you would create 3 batches:

Batch 1: 0th to 29th entity (Index 0 - 28)

Batch 2: 30th entity (single entity) (Index 29)

Batch 3: 31st to 100th entity (Index 30 - 99)

This also brings me to a design question - why doesn't the request simply return a deletion result with indication of which objects were not deleted due to 404. Why does it throw an exception? What is the reason for it.

One possible reason I could think of is because of Storage API's adherence to REST. You try to delete a resource, it's not there so API would throw the error. Furthermore, an entity could fail to delete not only because the entity is not present but also because the if-match conditional header specified in the request does not match. To elaborate, you may want to delete an entity only if eTag matches. In this case, even though the entity is present your delete operation would fail. To deal with 404 errors on single entity delete operation, all client SDKs have implemented DeleteIfExists kind of functionality which will eat 404 error.

like image 136
Gaurav Mantri Avatar answered Oct 05 '22 02:10

Gaurav Mantri


You could PUT empty entities with the same PartitionKey and EntityKey before DELETEing them. This way you'll be sure you won't have 404 errors. It's two consistent calls for every batch instead of retrying many times and complicate the logic of your app. Not an ideal answer but we don't live in an ideal world :)

like image 40
Antonio Fiumanò Avatar answered Oct 05 '22 03:10

Antonio Fiumanò