I have an Azure search service that is used to search through BLOBS (which are images) based on BLOB metadata.
The index the search is based on is set to refresh hourly.
However I am still getting results for BLOBs that don't exist anymore returned in Search results.
Using the Get Indexer Status API (output below) shows that the index has successfully refreshed after the BLOBS were deleted.
"status": "running",
"lastResult": {
"status": "success",
"errorMessage": null,
"startTime": "2018-02-05T16:00:03.29Z",
"endTime": "2018-02-05T16:00:03.416Z",
"errors": [],
"warnings": [],
"itemsProcessed": 0,
"itemsFailed": 0,
"initialTrackingState": "{\r\n \"lastFullEnumerationStartTime\": \"2018-02-05T14:59:31.966Z\",\r\n \"lastAttemptedEnumerationStartTime\": \"2018-02-05T14:59:31.966Z\",\r\n \"nameHighWaterMark\": null\r\n}",
"finalTrackingState": "{\"LastFullEnumerationStartTime\":\"2018-02-05T15:59:33.2900956+00:00\",\"LastAttemptedEnumerationStartTime\":\"2018-02-05T15:59:33.2900956+00:00\",\"NameHighWaterMark\":null}"
},
"
If it's relevant the BLOBs were deleted using Azure Storage Explorer
The problem this is causing is that these images are being output to a web page and currently displaying as missing images as well as making the index bigger than it needs to be.
While Soft Delete is an option, the index that is being targeted by the indexer can also be directly modified if you so choose.
You can use the POST to index API detailed on this page to directly delete documents, using their "key" field. An example below:
POST https://[service name].search.windows.net/indexes/[index name]/docs/index?api-version=[api-version]
Content-Type: application/json
api-key: [admin key]
{
"value": [
{
"@search.action": "delete",
"key_field_name": "value"
}
]
}
Assuming you didn't use field mappings to modify the default "key" behavior of blob indexers, from the documentation on this page the key field will be the base64 encoded value of the metadata_storage_path property (again, refer to the previous link for details). Therefore, upon deleting the blob, you can write a trigger to POST the appropriate payload to your search index from which you want the documents to be deleted.
After some reading I found that the only deletion policy currently supported by Azure search is Soft Delete.
To enable this for BLOB storage you have to create a metadata value on each BLOB (e.g. IsDeleted) and update this value to enable it to be captured by the Deletion policy.
PUT https://[service name].search.windows.net/datasources/blob-datasource?api-version=2016-09-01
Content-Type: application/json
api-key: [admin key]
{
"name" : "blob-datasource",
"type" : "azureblob",
"credentials" : { "connectionString" : "<your storage connection string>" },
"container" : { "name" : "my-container", "query" : "my-folder" },
"dataDeletionDetectionPolicy" : {
"@odata.type" :"#Microsoft.Azure.Search.SoftDeleteColumnDeletionDetectionPolicy",
"softDeleteColumnName" : "IsDeleted",
"softDeleteMarkerValue" : "true"
}
}
Full details here
I'll need to do some testing to ensure that it is safe to update the metadata and then immediately delete the BLOB.
Here is a solution I implemented for removing blobs in azure search data source.
In dictionary key is container name, values is list of files.
Here is code sample
public async Task<bool> RemoveFilesAsync(Dictionary<string, List<string>> listOfFiles)
{
try
{
CloudBlobClient cloudBlobClient = searchConfig.CloudBlobClient;
foreach (var container in listOfFiles)
{
List<string> fileIds = new List<string>();
CloudBlobContainer staggingBlobContainer = cloudBlobClient.GetContainerReference(container.Key);
foreach (var file in container.Value)
{
CloudBlockBlob staggingBlob = staggingBlobContainer.GetBlockBlobReference(file);
var parameters = new SearchParameters()
{
Select = new[] { "id", "fileName" }
};
var results = searchConfig.IndexClient.Documents.Search<Document>(file, parameters);
var filedetails = results.Results.FirstOrDefault(p => p?.Document["fileName"]?.ToString()?.ToLower() == file.ToLower());
if (filedetails != null)
fileIds.Add(filedetails.Document["id"]?.ToString());
await staggingBlob.DeleteAsync();
}
// delete from search index
var batch = IndexBatch.Delete("id", fileIds);
await searchConfig.IndexClient.Documents.IndexWithHttpMessagesAsync(batch);
}
return true;
}
catch (Exception ex)
{
throw;
}
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With