I have a situation where a user is attaching files within an application, these files are then persisted to Azure Blob storage, there is a reasonable likelihood that there are going to be duplicates and I want to put in place a solution where duplicate blobs are avoided.
My first thought is to just name the blob as filename_hash but that only captures a subset of duplicates, then filesize_hash was then next thought.
In doing this though it seems like I am losing some of the flexibility of the blob storage to represent the position in a hierarchy of the file, see: Windows Azure: How to create sub directory in a blob container
So I was looking to see if there was a way to create a blob that referenced the blob data i.e. some for of symbolic link but couldn't find what I wanted.
Am I missing something or should I just go with filesize_hash method and store my hierarchy using an alternative method.
Suggested Answer: You can provide authorization credentials by using Azure Active Directory (AD), or by using a Shared Access Signature (SAS) token. Box 1: Both Azure Active Directory (AD) and Shared Access Signature (SAS) token are supported for Blob storage.
What can you do to automatically transition your blobs between storage tiers based on factors like last modified date? Use Lifecycle Management - Azure lifecycle management can move blobs between tiers based on rules you set in the lifecycle management console.
Azure Blob is optimized to store massive amounts of unstructured data. Azure Blob storage is recommended in cases when the app supports streaming and random access, if the data must be accessed from anywhere or in data lakes and big data analytics scenarios.
No, there's no symbolic links (source: http://social.msdn.microsoft.com/Forums/vi-VN/windowsazuredata/thread/6e5fa93a-0d09-44a8-82cf-a3403a695922).
A good solution depends on the anticipated size of the files and the number of duplicates. If there aren't going to be many duplicates, or the files are small, then it may actually be quicker and cheaper to live with it - $0.15 per gigabyte per month is not a great deal to pay, compared to the development cost! (That's the approach we're taking.)
If it was worthwhile to remove duplicates I'd use table storage to create some kind of redirection between the file name and the actual location of the data. I'd then do a client-side redirect to redirect the client's browser to download the proper version.
If you do this you'll want to preserve the file name (as that will be what's visible to the user) but you can call the "folder" location what you want.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With